Starting with Knowledge Graphs

April 15, 2019

FIRST THINGS first, we need to know what “Knowledge Graph” is. The term was coined by Google because the existing term, “ontology”, sounded too much like “oncology” and no-one wants that association when they’re trying to sell a product or idea. So there’s two terms in use, “Knowledge Graph” is more often used in media and marketing, “ontology” is used more in academia and by the “in-group”. We’ll use the term “ontology” because the origin of the word describes what it is.

So what is an ontology? It originates in philosophy, and is a “formal description of the classes and relationships in a domain”. Let’s break that apart. So it’s describing classes in a manner akin to Aristotle’s taxonomies, examples would be “mammal”, “human”, “vehicle”, “fruit”, “liquid”. Classes are the “ideal” forms of what is being described, so a human may be defined as having 2 arms and 2 legs, but there are instances of humans for which this is not true.

The relations between classes describe how they are related, the most important is the subsumption relation, which is a stricter version of subclassing. However, there are many other important relations that might be of use, such as “part of”, “member of”, “has role”, “has quality”, “precedes”, etc.

Ontologies are typically restricted to a domain, due to the vast scope of all knowledge, it would be a monumental task to try and capture it all!

Finally, the description is formal, which for a computer scientist means it’s represented in some kind of logical language, typically a Description Logic. This is what gives ontologies their power: the use of logic to infer new information.

Why Ontology?

I’M GOING to assume you’re reading this because you’ve already decided to use ontology. So I’m only going to summarize what I think the major benefits of using ontology are:

Self-defining: always documented
Representation of truth, as opposed to the data you need for an application: makes your data resilient to changes in use-case
Formal: can infer new information
Simulates mental model: can be used for/by AI agents

Historically Inspirational Technologies

KNOWING, RESEARCHING and playing with some of the technologies that influenced the thinking that created Knowledge Graphs can be a real short-cut to understanding what they are, how they’re used, and what they mean. So OWL, which is the most common language used for describing ontologies, was inspired by Description Logics and Frames.

Description Logics are a subset of First-Order Logic, the important bits you need to know are that you can only have up to arity 2 (binary) relations and you can’t check if two variable fillers are the same in Description Logics. But we can prove that anything written in this subset is computable, so that’s useful. Description Logics are why we represent things in triples: entity-relationship-entity.

Frames are kind of like a precursor to Object-Oriented programming, they have an identity and slots that have fillers. This format can be transformed to triples by repeating the identity as the first entity for each slot and filler. Frames typically made use of the “a kind of” relation to capture subsumption. It’s worth implementing something with frames in Prolog to see how knowledge graphs can be used. There’s examples for various AI systems, but they’re not too common given the unfathomable lack of popularity of the language in recent times. Doing this exercise will give you a grounding in what you can do with a declarative language and where it fits into a broader system. It’ll put you in the same mindset as some of the early advocates of ontology for Semantic Web.

The Role of the Semantic Web

ONTOLOGIES AND Semantic Web aren’t really synonymous, but they’re often used in the same breath. Semantic Web is the most recent foray into the goals that began with earlier declaritive, intelligent information storage and retrieval. As such the technologies developed here are newer, including more up-to-date research and thinking, they’re more abundant, and easily available. There’s nothing forcing you to use Semantic Web technologies for your ontology, but if you do stick with them you can take advantage of a lot of other people’s work in the area. For example, there are visualisation tools like VOWL, checkers like OOPS, documentation generation with WIDOCO and debuggers like OntoDebug, all working with OWL.

So OWL is the language used to represent ontologies in the Semantic Web. Do yourself a favour, don’t start with reading the technical documents! I’d recommend working through the OWL book first, it’s a friendlier introduction! You’re also going to bump into stuff about XML+RDF, JSON-LD, Turtle, N3 and other formats for representing your OWL ontology. They’re just different markups that you can choose when you’re storing or transfering data in files. Don’t worry about it too much, you won’t often be reading or dealing with these directly.

Most ontology development is done with Protégé, the desktop version is better than the web one. This is an editor designed from the ground up for creating and exploring ontologies. It comes with a reasoner built in that you can use to see what you can infer, plus there’s lots of useful add-ons like VOWL, OntoDebug, and some matrix ones that give you a different view.

For data-storage there are dedicated triple-stores for Semantic Web ontologies, these can also provide reasoning to take advantage of your ontology. A review in 2017 rated Fuseki and Stardog as the best, but I really like GraphDB. To query these you’ll be using SPARQL, which is another technology I’d recommend avoiding the technical documentation for before you’ve gained a grounding in a more friendly introductory source, like this blog post.

Recommended Learning Pathway

IT’S A big subject to conquer, so let’s break it up a little.

Dipping a Toe In

Ontology Authoring

Triple-stores

Language Libraries

Python

Prolog

JavaScript

Java

OWLAPI
RDF4J

Haskell

SWISH
RDF4H