gbfs
gbfs copied to clipboard
Using GBFS within a Linked Data/RDF publishing strategy
Who am I
I’m a professor at Ghent University in Belgium, researching how to publish knowledge on Web-Scale. Previous work of my research team includes Linked Connections as a light-weight interface for public transit route planning, helping the European Railway Agency with publishing a dataset on railway infrastructure, helping the Flemish government publishing their base registries such as their address database and today we’re working on the Flemish Sensor Data Space, in which have a use case on bike sharing.
Motivating user stories
- As a data publisher, I want to use the GBFS terminology to annotate my website about my bike sharing initiative (e.g., with RDFa or together with schema.org)
- As the Flemish government, I want to align our vocabularies with GBFS and link towards the terms in the authoritative specification
- As a data consumer working on smart city infrastructure, I want to import GBFS data in my city’s NGSI-LD context broker
Solution
Convert the terms you define in the JSON schema towards an RDFS vocabulary. This can be done using a 1 on 1 mapping (I’m willing to pull request this if this is desired).
What should be the base URL on which all terms will be dereferenceable?
I’d propose https://w3id.org/gbfs#
. This way, for example the term num_docks_available
would get the URI https://w3id.org/gbfs#num_docks_available
. We can open a pull request at https://github.com/perma-id/w3id.org to add a redirect from w3id.org/gbfs to for example a github pages on this repository with this RDF file behind it. This way machines will be able to look up the authoritative definitions.
Is your potential solution a breaking change?
- Certainly not
Probably good idea to wait until this breaking change passed: https://github.com/NABSA/gbfs/pull/354
Hello @pietercolpaert, I'm a Product Manager at MobilityData, working on our tools and initiatives to increase data quality. 👋 Thanks for opening up this discussion.
I have very limited experience with linked data, RDF, and context information. I think this is a great opportunity, there is discussions in GTFS around versioning and URL schemes mentioning linked data.
I have a few questions to get a better understanding of what this proposal would imply:
As a data publisher, I want to use the GBFS terminology to annotate my website about my bike-sharing initiative (e.g., with RDFa or together with schema.org)
-
Why? In order to increase the discoverability? What are the motivations? Do you have an example of this in another area?
-
If we were to build an RDF schema vocabulary, could it replace the JSON Schema, or would they be complementary?
-
Did you consider JSON-LD?
-
Do you foresee any disadvantages or risks? e.g. higher complexity for consumers, or higher barrier to entry for producers
-
What could be other advantages of using linked data?
- How exactly could it help with the machine readability of GBFS?
- What would be the impact on versioning and on discoverability for different versions (currently covered by gbfs_versions.json)
Hi @isabelle-dr thanks for getting back to me: much appreciated!
- Why would you annotate a web page with GBFS semantics? In order to increase the discoverability? What are the motivations? Do you have an example of this in another area?
Discoverability and interoperability are certainly big motivations:
- A schema.org example: for parking lots one would use this schema.org entity: https://schema.org/ParkingFacility. The same doesn’t yet exist for bike sharing. For that, schema.org would have to reinvent GBFS into schema.org. Instead I’d propose defining these terms in GBFS, so schema.org can refer to it. Schema.org is used by search engines to extract rich snippets from web pages (https://developers.google.com/search/docs/advanced/structured-data/intro-structured-data)
- A wikidata+OpenStreetMap example: OpenStreetMap links train stations to wikidata. For example, in the infobox of the Montréal Central Station in OSM, you can see there’s a link to the Wikidata entity. Wikidata in its turn has a class description of a railway station, that in its turn again has a link to the schema.org equivalent class. Linked Data in this way builds interoperability between multiple domain models that slightly overlap, and make their instance data automatically aligned. How cool would it be if you can find a bike sharing station on OSM, that then gets a wikidata page that is linked to a type that has both OSM tag descriptions and a GBFS URI of the term?
These two examples give an idea of the motivation behind Linked Data, which I like to summarize as drastically lowering the cost of integrating a dataset in a different domain.
- If we were to build an RDF schema vocabulary, could it replace the JSON Schema, or would they be complementary?
I was (and still am) proposing a complementary approach where we try to generate an RDFS vocabulary and SHACL schema based on the JSON Schema files. However, we already know from experimenting with it together with @andreipopi that additional configuration is going to be necessary as there’s no full 1 on 1 mapping between these.
Just for being complete (this is not what I propose as it would requires changing your entire process as it is today and would broaden the scope of the GBFS schemas), the other way around would be possible in a more automated way: @ioggstream is working on RDF to JSON Schema: https://twitter.com/ioggstream/status/1473708713525534722
- Did you consider JSON-LD?
JSON-LD is one of the serializations in which Linked Data can be serialized. What I propose above would be a requirement before being able to use JSON-LD.
- Do you foresee any disadvantages or risks? e.g. higher complexity for consumers, or higher barrier to entry for producers
Disadvantage is that you’re going to do a little bit more. We’re going to document the extra configuration file that would be needed to document how the JSON schema can be translated towards RDFS and SHACL. Things I already think about:
Per JSON schema we’ll need:
- a base web address or namespace (URI) to start building the web addresses. This could be for example https://w3id.org/gbfs#num_bikes_available
- a type to give to the entity if the JSON schema describes an object
- how to map enums to codelists
- What could be other advantages of using linked data?
- You can use the GBFS data model in more than just JSON. You’ll be able to use it in HTML annotations, RDF/XML, text/turtle, CSV on the Web, ...
- You can describe similarities to other domain models
- You will convince data publishers to also adopt a good identifier strategy for their own bike stations
- ...
- How exactly could it help with the machine readability of GBFS?
Next the JSON schema tooling, also RDF tooling will be able to look up definitions and validate a file in any RDF serialization against the SHACL shape. I don’t see this as the biggest advantage.
- What would be the impact on versioning and on discoverability for different versions (currently covered by gbfs_versions.json)
We can also include the major version number of GBFS in the web address of the term. Otherwise I don’t expect any impact.
This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.
We are still working on a PR as a side-project. Not stale, give us a bit more time :)
This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.
Still working on it. We are:
- Creating a spec for adding tags to JSON schemas that can then allow a processor to translate the JSON schema to RDFS and SHACL
- Prototyping the actual processor
- Creating a github action that we could pull request here to automatically generate the Linked Data specs inside this repository and start from there to have more discussions
Will share the link to the spec, processor codebase and github action applied on the GBFS json schemas after validating it internally.
This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs. Thank you for your contributions.
You can find the code of our experiments here: https://github.com/jiaoxlong/json-schema-ld/tree/main
We found the generated RDF Vocabulary and SHACL shape at this moment to not be good enough. The idea however remains interesting to pursue.
Hi @pietercolpaert, I am a Product Manager for shared mobility at MobilityData. Thank you very much for sharing your work on Linked Data for GBFS. The topic of interoperability is very interesting and important to us. As per the governance, this issue will be closed in 30 days if there is no additional re-engagement. Have a great day! Fabien
This discussion has been closed due to inactivity. Discussions can always be reopened after they have been closed.