Semantic Web for Google

If you spend sometime trudging through Google Docs, you’ll find this page, which describes how to markup an Article (Blog post) with JSON-LD and Schema.org terms. Google use this information to understand your content, which helps them get your content to the people who actually want to read it.

Google recommend using JSON-LD format, which is a standard Semantic Web format, it can be served by itself or embedded into a webpage, just like regular JSON. If you view this page’s source code you’ll see JSON-LD embedded, but if you send a request to this same URL accepting “application/ld+json”, you’ll get a JSON-LD response. You can test it on Linux with a curl. From searching my logs it appears Google only scrape JSON-LD from the HTML though and don’t request any of the Semantic Web formats.

curl -Haccept:"application/ld+json" http://www.paulbrownmagic.com/blog/semantic_web_google

Google are also big fans of Schema.org, they use that as their public ontology and ask developers to tag their pages with those terms. Schema.org is not a complete ontology, it is built from a practical, on-demand perspective. So when enough people ask for a term, it’s added. It’s more about getting the job done than getting it done academically perfect.

Whilst it is possible to use RDFlib-JSONLD to serialise into JSON-LD format, because the embedded JSON-LD doesn’t change much, it is computationally more efficient to create it with a Jinja2 template. Plus you’re not required to store your data using Schema.org terms when using a template.

I’m going to assume you have some back-end storing your blog posts with all the data fields you’ll need that you pass through to your template. This would be what you pass in to your template already to populate it with the title, date, content etc. Here’s my Jinja2 template for my Google/Schema.org JSON-LD. I put this in a file called jsonld.html in my includes directory and then just include it in my blog post template.

<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "BlogPosting",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "{{ post.path }}"
  },
  "image": [{% for image in post.images %}"{{ image }}"{% if not loop.last %}, {% endif %}{% endfor %}],
  "headline": "{{ post.title }}",
  "datePublished": "{{ post.date }}",
  "dateModified": "{{ post.date_modified }}",
  "description": "{{ post.description }}",
  "author": {
    "@type": "Person",
    "name": "Paul Brown",
    "url": "{{ url_for('main.home_page', _external=True) }}",
    "sameAs": [
    "https://www.facebook.com/paulbrownmagic",
    "https://twitter.com/PaulBrownMagic",
    "https://www.linkedin.com/in/paul-brown-0533b084/",
    "https://www.youtube.com/user/JesterMagician/"
     ]
   },
   "publisher": {
    "@type": "Organization",
    "name": "PaulBrownMagic",
    "logo": {
      "@type": "ImageObject",
      "url": "{{ url_for('main.static', filename='images/icons/PaulBrownMagic.png', _external=True) }}"
      }
    }
}
</script>

Feel free to copy and paste but make sure you update the information, please don’t go telling Google you’re the sameAs me! This post is the last in a three part series showing how to publish data for big consumers, the first was Semantic Web for Facebook, followed by Semantic Web for Twitter. Next we’ll start looking at how to use data available on the Semantic Web.

Until next time, happy coding. Paul

*[URI]: Uniform Resource Identifier: an identity. *[SEO]: Search Engine Optimisation