Part 1 – Modules 
    
      Introduction Understanding data models Possibilities and limitations of RDF Data quality Data profiling and cleaning  
    
      Vocabulary reconciliation Metadata enriching REST Decentralization and federation Conclusions  
    
      Part 1 Part 2  
   
  
  
    VIDEO 
  
  
    
    
      (Non-)information resources 
    
      
        Pointing to resources with a common name
          Washington (city? state? president?)  
         
       
      
        Reuse or mint a URL  for them.
        
       
      
        Especially machines need unambiguous identifiers.
       
     
   
  
    
      The URL is part of a broader family 
    
      
        URL – Uniform Resource Locator
        
          unique identification and location of resources 
          mailto:ruben.verborgh@ugent.be  
       
      
        URN – Uniform Resource Name
        
          location-independent resource identifier 
          urn:isbn:0-83891251-6  
       
      
        URI – Uniform Resource Identifier
        
       
     
    
   
  
    
      The broadest family is IRI, 
    
      
        Not all characters are allowed in a URI.
        
       
      
        IRI – Internationalized Resource Identifier
        
          Non-ASCII chars don’t need to be encoded. 
          Chars with other meaning still need encoding. 
         
       
     
    
   
  
    
      An HTTP URL identifies  and locates 
      a resource anywhere in the universe.
     
    
      
        A string is a unique identifier ifat most one  entity corresponds to it.
        
          
            A national number uniquely identifies a person, 
         
       
      
        A string is a unique locator ifat most one  location corresponds to it.
        
          
            A street address uniquely identifies a location, 
         
       
     
   
  
    
      Using HTTP URLs ensures that 
    
      
        An HTTP URI of a resource can be dereferenced :representation .
        
       
      
        This relies on the double role of an HTTP URIidentifier and locator .
       
      
        Dereferencing is a core principle of Linked Data.
        
          
            If you don’t know something, look it up.
           
         
       
     
   
  
  
    
      By including links to other resources, 
    
      
        Links connect a resource to known concepts.
        
       
      
        Links give meaning to data.
        
       
      
        Links allow exploration of related data.
        
          Find more by the same author.  
       
     
   
  
    
      An immense amount of Linked Data 
    
      
        On the structural  level, hundreds of vocabularies 
          
            They provide properties
            and classes to reuse.
           
          
            Although not always counted as Linked Data 
         
       
      
        On the content  level, thousands of datasets 
          Strive to reuse identifiers rather than to mint new ones. 
         
       
     
   
  
    
      No Linked Data set is ever complete.open-world  assumption.
     
    
      
        Relational databases use highly rigid structures.
          
            A NULL value can signal missing data, 
         
       
      
        With Linked Data, no source has all of the truth.might  have more data on a subject.
        
          The absence of a fact does not  imply its falsehood. 
          A fact has 2 possible states: true  and unknown . 
         
       
     
   
  
    
      Several vocabularies are used frequently 
    
      
        modeling vocabularies
        
       
      
        general-purpose vocabularies
        
       
      
        concept-specific vocabularies
        
       
     
    Find the one you need at Linked Open Vocabularies . 
   
  
    
      The Dublin Core terms  are a set of 
    
      
        Each property is generic, 
      
        Many applications use the Dublin Core terms.
        
          good interoperability of high-level semantics 
         
       
     
   
  
    
      Schema.org  is a single vocabulary 
    
      
        Created and maintained by major search engines, 
      
        Its concepts are defined rather loosely.
        
          This makes it flexible to use for developers. 
          Machines cannot derive much knowledge from it. 
         
       
      
        Schema.org is manually curatedopen for extension .
       
     
   
  
    
      Billions of Linked Data facts are 
    
      
        The most well-known dataset is DBpedia .
        
          Data is extracted automatically from Wikipedia . 
          Like Wikipedia, it exists in several different languages. 
          Its quality is acceptable for many queries. 
         
       
      
        Wikidata  is a manually curated alternative.
        
          It has its own data model on top of RDF. 
          It grows fast and might overtake DBpedia. 
         
       
      
        You can find many other datasets on Datahub .
       
     
   
  
    
      RDF Schema  is an RDF vocabulary 
    
      
        RDFS defines classes, properties, and datatypes 
      
        RDFS defines concepts in two namespaces.
        
       
     
   
  
    
      Practitioners in the RDF world oftenontologies .
     
    
   
  
    
    
      
        RDFS  captures basic ontological relations,
          cardinality restrictions on properties 
          inverse, symmetric, and transitive properties 
          equality and disjointness 
          … 
         
       
      
        OWL extends RDFS with advanced concepts.
        
          RDFS and OWL are used side by side. 
         
       
     
   
  
    
    
      
        SPARQL is a query language.
        
          Select specific data from an RDF dataset. 
          Insert, change, or delete data in an RDF dataset. 
         
       
      
        The SPARQL protocol is a Web API definition
          
            A SPARQL endpoint  executes SPARQL queries by clients 
         
       
     
   
  
  
    
    
      
        A BGP is a set of triple patterns .
        
          Their syntax is a superset of Turtle. 
         
       
      
        A triple pattern 
        is a triple in whichvariable .
        
          Variables start with a question mark (?name). 
         
       
      
        A SPARQL query engine finds solution mappings .
        
          
            Variables and blank nodes are mapped to URIs, 
         
       
     
   
  
    
      This query searches DBpedia
      for artists influenced by Picasso.
     
    
SELECT DISTINCT ?person ?personLabel WHERE {
  ?person a dbo:Artist.
  ?person foaf:name ?personLabel.
  ?person dbo:influencedBy dbr:Pablo_Picasso.
}
      Here is the live result  of that query.
    
   
  
    
      This query searches Wikidata
      for artists influenced by Picasso.
     
    
SELECT DISTINCT ?person ?personLabel WHERE {
  ?person wdt:P106/wdt:P279* wd:Q483501.
  ?person wdt:P737 wd:Q5593.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
      Here is the live result  of that query.
    
   
  
    
      Why are the resultsand  the queries different?
     
    
      
        DBpedia and Wikidata contain different data.
        
          So far, so good. This was expected. 
         
       
      
        DBpedia and Wikidata use different ontologies.
        
          Ideally, the same SPARQL queries would suffice. 
          …and they can, when ontologies link to each other. 
         
       
      
        In practice, some bridging is still needed.
        
          Semantic Web reasoning  can bridge the gap, 
         
       
     
   
  
  
    
      Heterogeneity exists on multiple levels 
    
      
        Heterogeneity exists on the data  level.
        
          We can choose our own vocabularies. 
          How do we ensure they align? 
         
       
      
        Heterogeneity exists on the interface  level.
        
          We can choose how consumers can query our data. 
          How can clients consume multiple datasets easily? 
         
       
     
   
  
    
      Heterogeneity is our best friend 
    
      
        Anybody on the Web is free
          This works great for people —sometimes. 
          It often doesn’t work great for machines . 
         
       
      
        Standardization helps us align.
        
          delicate balance between flexibility and interoperability 
         
       
     
   
  
  
    
      Standardization and agreement 
    
      
        the Semantic Web family of standards
        
       
      
        ontologies and vocabularies
        
          Dublin Core 
          DBpedia ontology 
          Wikidata ontology 
          Schema.org 
          … 
         
       
      
        Web APIs not yet
        
          Linked Data Platform 
          OAI-ORE 
         
       
     
   
  
    
      The current level of standardization 
    
      
        vocabulary usage
        
       
      
        vocabulary agreement
        
          the right terms for the right clients 
         
       
      
        Web APIs
        
          stop reinventing the wheel 
         
       
     
   
  
    
      Which vocabularies should we use 
    
      
        We need to develop examples and guidance.
        
          vocabulary usage 
          URL strategy 
          … 
         
       
      
        Reasoning can fill vocabulary gaps.
        
       
      
        We can never cover all  vocabularies.
        
       
     
   
  
    
      Web APIs are the Achilles’ heel 
    
      
        Shall we all have our SPARQL endpoints?
        
       
      
        Shall we all support the Linked Data Platform?
        
          That doesn’t solve querying… 
         
       
      
        Shall we all have our own custom APIs?
        
          That’s not a sustainable way. 
         
       
     
    
      See more in the REST  module.
    
   
  
    Self-assessment 1: HTTP URLs 
    Why are HTTP URLs important for Linked Data?
    
      
        HTTP URLs are not  important for Linked Data.
        No. While not necessary for the RDF data model itself,  
      
        Because they guarantee consistent semantics.
        No. Semantic consistency comes from
            the reuse of unique  concept identifiers (URIs),
            not specifically from HTTP URLs.  
      
        So we can look up (un)known concepts.
        Yes. HTTP URLs can be dereferenced :
            to obtain more data about a concept, follow its URL.  
     
   
  
    Self-assessment 2: OWL and RDFS 
    Which of the following propositions are true?
    
      
        RDFS and OWL are an answer to Schema.org.
        No: Schema.org is (mainly) a vocabulary
            with terms to describe concrete things,
            such as books, people, articles, …
            RDFS and OWL contain terms to model other ontologies or vocabularies
            (such as Schema.org).  
      
        OWL replaces RDFS.
        No: RDFS is still required to express basic ontological relations.
            As such, RDFS and OWL are often used side by side.  
      
        OWL extends RDFS.
        Yes: OWL extends RDFS with more advanced ontological concepts.  
     
   
  
    Self-assessment 3: SPARQL 
    What is SPARQL?
    
      
        A data model.
        No. The data model is RDF.  
      
        A query language.
        Yes, SPARQL is a query language for RDF.  
      
        A protocol.
        Yes, SPARQL is a protocol to execute SPARQL queries over HTTP.  
     
   
  
    Self-assessment 4: SPARQL queries 
    Does the same  SPARQL query return the same  resultsdifferent  sources about the same  metadata?
    
      
        In theory, but not in practice.
        Yes: in theory, SPARQL queries should be interoperable across datasources. Even if different sources use different ontologies, reasoning can bridge the gap.  
      
        In practice, but not in theory.
        No: in practice, datasets use different ontologies and most endpoints do not have reasoning enabled to bridge the gap.  
      
        In theory and  in practice.
        No: in practice, datasets use different ontologies and most endpoints do not have reasoning enabled to bridge the gap.