Linked Data

By Maurizio Farina | Posted on October 2017 | DRAFT

This tutorial is an overview about Linked Data. The Linked Data are interlinked "structured data" to use for semantic queries. Generally structured data (often called DataSet) are extracted from unstructured data. One of this example is DBPedia extracted from Wikipedia.
The term was conied by Tim Berners-Lee for the Semantic Web project.


"Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak."

The Linked Data play a key role in the Semantic Web.

Key technologies

Technology Description
URI Uniform Resource Identifier (URI): is an identifier for resources
RDF Resource Description Framework (RDF): a collection of W3C specifications used to describe things and their relationships with other things.
Dataset a generic term to indicate collection of data. In this case Linked Data are Dataset following Linked Data specifications.
SPARQL recursive acronym for SPARQL Protocol and RDF Query Language is an RDF query language for databases able to retrieve and manipulate data stored in RDF. Many semantic platforms have SPARQL endpoint to query RDF databases.

SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns.[8]


In information technology one of common use of URI is URL (Uniform Resource Locator) generally used to identify a resource across the WEB (a web address, a RESful WebService and so on). URIs are important because allow to refer a thing inside LinkedData uniquely.


RDF specifications are based on triples: statements about web resources (and others) in the form of subject-predicate-object.

For java developers, the simplest way to understand an RDF is to think a class with its attributes where RDF uses subject and not entity, predicate and not attributes and object instead attribute value.

RDF can have several serializatin formats:

Format Description
Turtle a compact, human-friendly format.
N-Triples a very simple, easy-to-parse, line-based format that is not as compact as Turtle.
N-Quads a superset of N-Triples, for serializing multiple RDF graphs.
JSON-LD a JSON-based serialization.
N3 or Notation3 a non-standard serialization that is very similar to Turtle, but has some additional features, such as the ability to define inference rules.
RDF/XML an XML-based syntax that was the first standard format for serializing RDF

To have an idea about the serialization formats is possible to query DBpedia for Dante Aligieri and export RDF using the different formats.


Around the web is possible to find already ready datasets as Linked Data in different formats. For project i've used DBpedia. The following list contains some helpful sources to find other useful datasets:

## References

Resource Description
Apache Jena A free and open source Java framework for building Semantic Web and Linked Data applications. To highlight: RDF API (Create and Read RDF), ARQ (Query RDF using SPARQL 1.1 endpoint), TDB (high performance triple store)