Geocoding: converting addresses to coordinates

What is geocoding?

Since people and things are located somewhere on the surface of the planet, having a precise numerical position for them allows a lot of powerful applications. In the real world you will be often given addresses and not coordinates, so you will have to use geocoding to convert a verbal description to latitude and longitude.

"Via della Farfalla 32, 00155 Roma" => [lat: 41.895, lon: 12.585]

How does geocoding work?

The same address can be written in a variety of ways. Note the components in different order, language and nuance:

Via della Farfalla 32, 00155 Roma
via farfalla 32 RM
32 Farfalla street, Rome, p.o. 00155

The address (1 in diagram below) is firstly converted to a structured format (each geocoder uses its own), in which the name of the street, civic number, locality and postal code are explicitly separated (2). This data structure is in turn interpolated with a Geographic Information System (GIS) to find the coordinates (3).

geocoding

Imagine GIS as a big dataset containing geometric shapes representing real places like roads, intersections and buildings. Each shape is associated with one structured address and is geometrically defined by a set of nodes (latitude and longitude). So if you know an address you can obtain coordinates and if you know the coordinates you can obtain an address (this last process is called reverse geocoding). Let’s see an example of such a shape in OpenStreetMap XML data format:

<way id=”27509641 visible=”true version=”3 changeset=”13202660 timestamp=”2012-09-22T00:33:14Z user=”Davio uid=”217070>
<nd ref=”302010593/>
<nd ref=”1166474858/>
<nd ref=”1166475062/>
<nd ref=”302010643/>
<tag k=”highway v=”residential/>
<tag k=”lit v=”yes/>

<tag k=”name v=”Via della Farfalla/>

That was Via della Farfalla, a residential street comprising a number of node points (nd). Each node contains in turn the coordinates and metadata such as contributor and latest update:

<node id=”302010593 visible=”true version=”2 changeset=”7364009 timestamp=”2011-02-22T14:43:46Z user=”Davio uid=”217070 lat=”41.8958580 lon=”12.5854057/>

The fun part is that these informations are directly available at web addresses. Check both the street and the specific node. If you feel brave you can download the world data dumps both in XML and linked data.

Geocoding services

Even if the data is publicly available (thank you OSM!!!), the algorithms for optimal geocoding are quite complex. Luckily there are web APIs that offer free geocoding, at some common conditions:

  • Registration: to use the service you must sign up and obtain an API key (a password)
  • Rate limiting: you can geocode a fixed number of addresses per minute, or a fixed total number of addresses
  • Data license: you can not save or use commercially the response provided by the goecoder

Geocoding services operate on data coming from no-profits (i.e. OpenStreetMap) or companies (i.e. Google). Of course you can expect corporate data (and so the algorithms) to be higher in quality, but the collaborative resources are getting better and better. Lately a number of open source geocoders have been published and promise very well.

Example of geocoding with an API

The procedure is really similar across vendors. After registering, you get an API key. To obtain coordinates of a place you send an http request following similar syntaxes:

  • Google:
    https://maps.googleapis.com/maps/api/geocode/json?address=ADDRESS&key=YOUR_API_KEY
  • OpenCage:
    http://api.opencagedata.com/geocode/v1/json?query=ADDRESS&key=YOUR_API_KEY
  • MapBox:
    https://api.mapbox.com/geocoding/v5/mapbox.places/ADDRESS.json?access_token=YOUR_API_KEY