node-schema-org
node-schema-org copied to clipboard
FYI - schema.rdfs.org publishes schemas in JSON format
http://schema.rdfs.org/
http://schema.rdfs.org/all.json
The page is sadly not maintained anymore. If needed we scrape Schema.org every day and offer a download possibility at: http://schema.link.fish
http://schema.link.fish/downloads/all.json http://schema.link.fish/downloads/all.json.gz (same but gzip)
@janober neat! Do you use this library to do the scraping?
No have to confess I did use the one of http://schema.rdfs.org/: https://github.com/mhausenblas/schema-org-rdf/tree/master/scrapers
They were the first I found some time ago. This library I just found today.
Does this library still work or did it also break with the changes to the schema.org website?
Might have broken, but I know that @rektide is currently using it.
@janober cool to know about http://schema.link.fish/downloads/all.json !
Without any official json sources for schema.org do you intend to keep this one around as the best alternative?
Yes that is the idea. I need the data and so would have to parse it regularly anyway so it is there to stay.
@janober That's good to know thanks.
Any reason why datatypes isn't avail on the file?
@gebrits Sorry there was something wrong in the parse script. Did fix it and now also the datatypes are back.
cheers @janober :+1:
hi @janober Sorry to bother but supertypes-property seems to be broken on a couple of instances.
More specifically any subtype of CreativeWork seems to have a supertypes = [] , while they should list supertypes = ["CreativeWork"] instead.
Ah yes thanks, you are right. The new page-structure is really not great for extracting information. However I made some changes and all the ones I checked seem now to be fine.
Please tell me if you find any other issues.
Are you using the released schema.org version (at http://schema.org) or some develop-branch?
I'm asking because the latest json misses a couple of properties which are avail on the released schema.org. This is not exhaustive (since I'm only checking for the ones we're using), but at least these properties are missing:
branchCode,containedInPlace,containsPlace,screenCount,iataCode,icaoCode,character,commentCount,hasPart,license,countryOfOrigin,composer,iswcCode,lyricist,recordedAs,isrcCode,recordingOf,catalogNumber,creditedTo,recordLabel,releaseOf,containsSeason,dissolutionDate,parentOrganization,sport,athlete,coach
Ehm, all of them can be found under "properties" (at least the first few and the last one, the other ones I did not check because I also expect them to be there)
Hmm sorry about that. Sublime apparently choked on the file. All good now
No problem, great to hear ;-)
@jaygray0919 Saw in the other thread that you were wondering how the file gets generated.
The original scrape-script can be found here: https://github.com/mhausenblas/schema-org-rdf sadly did it not work anymore because the website changed. So all I did is to fix it. The fixed version can be found in my fork here: https://github.com/janober/schema-org-rdf
That script got originally used by this website: http://schema.rdfs.org
However they stopped supporting it and for that reason is file they offer for download very old. So because we need the data for our site link.fish anyway I simply decided to take over for schema.rdfs.org and offer the scraped schema.org data for download.
Everything is also described on: http://schema.link.fish
Apologies for being pedantic but, do either of the mentioned JSON extracts conform to JSON-Schema (http://json-schema.org)?