April 21, 2020
A few days ago I wanted to find a dataset with a list of winners of the Palme d’Or. The table on Wikipedia wasn’t formatted in a way that would be easy to copy, and I figured this was as good a time as any to figure out how Wikidata works.
Wikidata is a Wikimedia Foundation project that aggregates structured information about the world. It stores and retrieves facts based on an item/property/value system. Items are nouns, like Parasite. Properties define a relation, such as “award received” or “director”; they also specify another item as the value associated with that property, like Palme d’Or or Bong Joon-ho.
Wikidata provides a public query interface that uses a query language called SPARQL. No relation with Spark: it’s actually running on a graph database called Blazegraph. The query language works on a sort of fill-in-the-blanks basis with the item/property/value triples.
For example, to see which awards Parasite won, you can write a query like this: