Robust identifiers leading to quality data

March 14, 2019 by Stefan Summesberger

Phil Archer joined GS1 - the standards body behind the barcode - in 2017 to bring an in-depth knowledge of the Web to the world of supply chains for retail, healthcare and more. Before that, Phil was the Data Activity Lead at W3C, the industry standards body for the World Wide Web. We still have vivid memories of his opening Keynote Speech at SEMANTiCS 2014 which is why for the 15th anniversary edition this year time has come to catch up. In this interview, Phil explains how he’s working with colleagues to introduce Linked Data and semantics to supply chain and consumer data as well as giving advice on how to create value out of them and how to get prepared to do so.

At GS1, you are trying to introduce Linked Data and semantics to supply chain and consumer data. Please elaborate on your projects: Where did you start, what is the main goal, where are you now and what are the next challenges?

The barcode was introduced at point of sale in 1974, 15 years before the conception of the Web. It is just the visible part of the identifiers and information systems that underpin the manufacturing, supply chain and retail industries. For a variety of reasons, hitherto these industries have not embraced the Web as a data sharing platform. All we’re doing now is putting the identifier system together with the Web. Basically this:

https://example.com/

The standard we published last August details how to convert any set of GS1 identifiers into a URI (we have a lot more than the familiar UPC/EAN codes that we refer to as GTINs). See https://www.gs1.org/standards/Digital-Link/ and you can play around with creating what we call Digital Links at https://gs1.github.io/GS1DigitalLinkToolkit.js/ (Digital Link is our name for a URI that encodes GS1 identifiers although this term was coined after the spec had been written!)

Do you want to create more value from your data? Learn how at SEMANTiCS 2019!

We’re now working on phase II of the spec in which we’re defining the concept of a GS1 conformant resolver, that is, a Web server that understands the Digital Link syntax and can redirect requests to relevant resources. So for a given food product you might have a product description page, some recipe ideas, a TV ad etc.

We’re not done yet. We have to finish the spec and sort out a few details but we aim to have the spec finished and ratified around May this year. NB. those ‘linkType’ values are not fully defined yet, they’re very brittle. Actually, one issue we need to cover fully is that of the semantics. In the examples above I’ve just given you the GTIN, but this:

https://id.gs1.org/gtin/05011157888163?exp=191100

Includes the product’s expiry date. Does that mean that all instances of the class identified by https://id.gs1.org/gtin/05011157888163 expire at the end of November 2019. Nope. There’s an implied blank node in there that we need to make explicit.

From a business point of view we’re saying that a standard structure for URIs means:

  • An application can take any set of GS1 identifiers and construct a URI that can be dereferenced against a GS1 conformant resolver. So you don’t need to have the full URL in the data carrier (you don’t have to use QR or NFC).
  • If you do encode the full URL in the data carrier, then a generic app can follow that URL and reach a default customer experience.
  • A specialist app can add directives to the linkType parameter and ask for specific types of resource.
  • And a minor update to the world’s scanners would mean that, one day (not today) it’ll be possible to use things like QR codes at point of sale. That’s against architecture of the WWW that says all URIs are dumb strings and you SHOULD NOT infer meaning from them. Sorry – we’re going against that (and we’re not alone). Worse, we’re saying that the domain name doesn’t matter. Take me away in chains, guvnor. But we need brand owners to be able to put their own domain names in their URIs and not to insist on a specific domain name. And we want a distributed network of resolvers, not a single point of failure etc.
  •  So you only need one barcode per pack, not separate ones for supply chain applications and consumer interaction, as is common now.

Within the GS1 community, Digital Link is positioned as a means of just using one barcode for everything and of simplifying the (surprisingly complex task) of providing business partners with things like images, descriptions and more. Linked Data as a term is explained in the spec and, I like to think, we’re creating a node on the LOD cloud (eventually), but that’s not how we’re pitching it.

Are you interested in Linked Data, Machine Learning and AI? Register for SEMANTiCS 2019!

So, what are the five most important steps in order to create value from supply chain and consumer data?

There’s nothing special about supply chain data that doesn’t apply universally. It’s about data quality, completeness and availability, based on a rigorously managed system of globally unique identifiers.

I’ve become more relaxed about Semantic Web purity (spend enough time with Dan Brickley and it comes naturally). Sure, I’d like there to be a network of SPARQL endpoints serving product data as RDF but the delta from where we are to get there is too big. What manufacturers, or their agents, do have is a catalogue of assets related to products. Some consumer-facing, some business-partner facing. Getting the right data to the right people is hard, simply because of the way manufacturers, especially large ones, operate and evolve over time. So we can’t think in terms of SPARQL endpoints. We can, however, think in terms of persistent URIs for every product (https://example.org/gtin/{gtin}) and try, over time, to make them dereferenceable in a useful manner, ideally leading to both human-readable material and some schema.org-based structured data. Make sure those links have a @rel attribute that comes from a controlled set of relationship types and… well, it’s pretty close to triples/hypermedia.

How can ecommerce companies adapt to the innovations you have in your pipeline? How can SMEs in Europe get prepared?

There are a whole bunch of steps to take but the good news is that they can be incremental. If you have a catalogue of products, start by putting in place some redirects so that the GS1 Digital Link syntax leads to the correct entry.

Work with manufacturers to collate links related specifically to those products (not to some brand home page – that’s next to useless!).

There will be more coming down the pipe in a few months’ time, including a full implementation guide, but in the meantime, we’re ready right now with our experimental resolver to support proofs of concept. Please contact me (phil.archer@gs1.org) to see what we might be able to do together.

Which medium-sized company can you think of spontaneously that handles the digital transformation particularly well? Why?

It’s hard to answer that in public, sorry, I’ll give you one example we’re working with.

There’s a membership organisation in Belgium and Luxembourg, pharma.be, that, among other things, manages a compendium of electronic patient information leaflets for medicines. https://www.e-compendium.be/ It’s a classic portal providing a search box that can lead a human to any eLeaflet in any of the three official languages in Belgium and Luxembourg (DE, NL, FR). Importantly from a GS1 point of view, they already have a mapping from each leaflet to the GTIN of the medicine in question. So how about we could just scan the pack with a mobile phone and get the eLeaflet in whatever language our phone works in?

There’s a similar compendium in Norway, and others elsewhere. From a pharma company’s point of view, it’s about easy access to patient info. That needs to be the right leaflet for the right drug. Since GTINs are cross-border identifiers, that’s a good place to start.

It always comes back to the same things: robust identifiers leading to quality data. Nothing new there…

Are you interested in Linked Data, Machine Learning and AI? Register for SEMANTiCS 2019!

About SEMANTiCS

The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world. http://www.semantics.cc