Linked Data – a tutorial with Computer Grandma

Linked data is a way to publish data on the web. There are some technicalities involved, it’s not as easy as understanding a CSV, but the standard is so elastic and powerful to allow endless possibilities. Once you get the point there is no turning back: it will be speechless daydreaming, like science fiction becoming real. To understand linked data you should get familiar with three ideas:

A – assign a web address to anything
B – express facts in triples
C – merge the two above and go online

Follow along so me and computer grandma can guide you through.

computer_grandma

A – Assign a web address to anything

The first step is give an http address to anything. Your city, the bus, the office, a moment in time, yourself and even your grandma’s apple pie. Anything can be labeled by a web address (http://something):

  • http://example.com/me
  • http://example.com/grandma
  • http://cookingsite.net/pie
  • http://foodnoprofit.info/apple

As with concrete things, we can also name abstract concepts and relations, for example authorship of an artwork or interpersonal knowledge:

  • http://concepts.org/Person (a generic person)
  • http://cookingsite.net/author (A is author of B)
  • http://example.com/knows (A knows B)

To let people across the world use a shared set of abstract but common concepts/relations, a number of predefined vocabularies have been published. One of the most used is schema.org, which you can use to express the three concepts above:

Some important properties are historically defined, for example the “is a” property.

B – Express facts in triples

We want all those things and relations to form a knowledge base, a set of facts. The simplest and less assuming way to express a fact is the triple, a statement in the form subject – predicate – object. The triples to say that my grandma is a person born in 1934 who made an apple pie are:

grandma (subject), is a (predicate), person (object)
grandma (subject), is related to (predicate), me (object)
grandma (subject), was born in (predicate), 1934 (object)
pie (subject), was made by (predicate), grandma (object)

Every entity can have other properties and relations. Grandma stole my apple to make her recipe:

pie (subject), is a (predicate), recipe (object)
pie (subject), has ingredient (predicate), apple (object)
me (subject),  owns (predicate), apple (object)
me (subject), is a (predicate), person (object)

So as you can guess it’s easy to get wild with this system. All of the triples above form a graph (a network) which we want to publish on the web.

 

grandma_graph

 

C – A World Wide Graph Database

Let’s combine the first and second step, to finally see the grandma graph expressed in linked data:

<http://example.com/grandma> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
<http://example.com/grandma> <http://schema.org/relatedTo> <http://example.com/me> .
<http://example.com/grandma> <https://schema.org/birthDate> "1934" .
<http://cookingsite.net/pie> <http://schema.org/author> <http://example.com/grandma> .
<http://cookingsite.net/pie> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Recipe> .
<http://cookingsite.net/pie> <http://schema.org/recipeIngredient> <http://foodnoprofit.info/apple> .
<http://example.com/me> <http://schema.org/owns> <http://foodnoprofit.info/apple> .
<http://example.com/me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .

This specific dialect for expressing triples (brackets and dots) is N-Triples, but there are many others. All of them are part of a family called Resource Description framework (RDF). In the same way HTML allows us to express documents in a shared format, RDF allows us to express data. You can write documents and data in a lot of ways, but the key feature differentiating web formats from others is our lord, the link. In HTML you travel from one document to another with an <a> tag, in RDF you travel from one dataset to another with any of the three elements forming a triple. Please appreciate!

The protocol to serve and consume RDF data across a vast network of computers is already in place, it’s called http and you are using it right now to read this tutorial. You can see linked data as an extension to the traditional web, or if you are a radical like me, see linked data as a superset. Linked data comes with the same essential freedom of the web: anybody can start a server and publish their stuff, link and get linked, no central authority.

Linked Data – once called the semantic web – is doing for data what the web did for documents. Our grandma graph is a fun example, but this technology is no joke. Take a moment to imagine a global data graph containing monetary transactions, scientific discoveries, government laws and everything you can name. All interlinked, with the possibility to flow “from one domain into an other”, as says in this video Tim Berners-Lee, the inventor of both the web and linked data (2:50):

Conclusion and a personal note

For a while I was upset that computer grandma used my apple to cook the pie. But her higher aim was to restore our faith in technology, once again. And by the way, the pie was tasty. I hope grandma helped you understand the expressivity and wide implications of linked data. The best case scenario is that you are so freaked out about all this that you are already searching material to learn more. I’m happy if that happened, thank you for reading and spread the word.

To keep this tutorial light I’m writing some personal opinions on a separate page. It’s about the relation between linked data and neural networks, and about linked data as a fundamental component of strong AI. Stay tuned!