datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Datasets should be annotated with JSON-LD

Open dennwc opened this issue 7 years ago • 7 comments

See https://developers.google.com/search/docs/data-types/dataset .

dennwc avatar Apr 25 '18 20:04 dennwc

@campoy has been working on a proposal to add metadata to our datasets: https://github.com/src-d/guide/pull/163 it would be good to add this info to that discussion.

smola avatar Apr 26 '18 07:04 smola

I'm curious, what are the benefits of JSON-LD over other formats such as PMML?

campoy avatar May 04 '18 00:05 campoy

@campoy Is PMML used for dataset metadata at all?

Anyway, I think the issue name is misleading, JSON-LD is the format, but schema.org/Dataset (+ Google extensions?) is the actual schema. It seems that Google will start using it to discover datasets from 3rd parties, so that alone might signal future adoption with high probability, and also schema.org stuff usually ends up being more used in the long term.

With respect to the format itself, we might prefer JSON (afaik JSON-LD is valid JSON) for convenient parsing of metadata rather than XML.

smola avatar May 04 '18 08:05 smola

I don't have much experience on this, so if @smola has a preference for JSON-LD and Google is also using it, I say let's go with that.

campoy avatar May 16 '18 19:05 campoy

Note that I have no strong preference for JSON-LD itself, since I never really used it. But I have a preference for adopting schema.org et al vocabularies as well as JSON over XML.

smola avatar May 18 '18 08:05 smola

Related: https://ai.googleblog.com/2018/09/building-google-dataset-search-and.html

dennwc avatar Sep 30 '18 17:09 dennwc

So, Where is Your Dataset? It is probably clear by now that Dataset Search is only as good as the metadata that exists on the Web pages for datasets.

The most common answer to the question of why a specific dataset does not show up in our results is that the Web page for that dataset does not have any markup. Just pop that page into the Structured Data Testing Tool and you will see whether the markup is there. If you don't see any markup there, and you own the page, you can add it

Yes, basically if we could just annotate dataset homepage with structured information https://search.google.com/structured-data/testing-tool and it have a good chances of being indexed.

bzz avatar Oct 25 '18 08:10 bzz