You are currently browsing the tag archive for the ‘scweery’ tag.

At work we recently completed a task to generate a datafile of locations mapped to nearby locations. Our input is a Postgres table of location data, including name, latitude and longitude. Our target output is a YAML file of each location mapped to those locations which are geographically close.

With the PostGIS extension, PostgreSQL has excellent support for calculating the distance between locations, so our calculations were delegated to the SQL queries. Essentially the query joined each location with every other location and excluded those which exceeded our distance threshold (for example, 25,000 metres).

The task was implemented in Rake. The execution time was in the order of 8 hours. It was assumed that the majority of time was needed for Postgres to perform the GIS calculations. Not so.

I reimplemented the task in Scala using Scweery. The Scala version differs from the rake task in several ways. It is strongly typed (as necessitated by the language) and, apart from IO operations, purely functional (my personal preference). It also used JDBC instead of ActiveRecord. Apart from these differences the algorithm is identical to its Ruby counterpart.

So did the Scweery/Scala version outperform Rake? Yes, without a doubt. /usr/bin/time reports that the Scala version was completed in 8m, 55s. That’s at least 50 times quicker. Importantly, it turns what was an overnight task into what is now a coffee break task.

Advertisements