Datasets should be annotated with JSON-LD
See https://developers.google.com/search/docs/data-types/dataset .
@campoy has been working on a proposal to add metadata to our datasets: https://github.com/src-d/guide/pull/163 it would be good to add this info to that discussion.
I'm curious, what are the benefits of JSON-LD over other formats such as PMML?
@campoy Is PMML used for dataset metadata at all?
Anyway, I think the issue name is misleading, JSON-LD is the format, but schema.org/Dataset (+ Google extensions?) is the actual schema. It seems that Google will start using it to discover datasets from 3rd parties, so that alone might signal future adoption with high probability, and also schema.org stuff usually ends up being more used in the long term.
With respect to the format itself, we might prefer JSON (afaik JSON-LD is valid JSON) for convenient parsing of metadata rather than XML.
I don't have much experience on this, so if @smola has a preference for JSON-LD and Google is also using it, I say let's go with that.
Note that I have no strong preference for JSON-LD itself, since I never really used it. But I have a preference for adopting schema.org et al vocabularies as well as JSON over XML.
Related: https://ai.googleblog.com/2018/09/building-google-dataset-search-and.html
So, Where is Your Dataset? It is probably clear by now that Dataset Search is only as good as the metadata that exists on the Web pages for datasets.
The most common answer to the question of why a specific dataset does not show up in our results is that the Web page for that dataset does not have any markup. Just pop that page into the Structured Data Testing Tool and you will see whether the markup is there. If you don't see any markup there, and you own the page, you can add it
Yes, basically if we could just annotate dataset homepage with structured information https://search.google.com/structured-data/testing-tool and it have a good chances of being indexed.