guidance on large Croissant files, especially in `<head>`
Could the spec offer guidance on large Croissant files, especially when they are added to the <head> of a dataset landing page, greatly increasing its size?
This is not a new problem for us (Dataverse). To support Google Dataset Seach, we already include Schema.org content, which can be quite large, in the <head> of pages. A dataset with 25,310 files has a Schema.org file that is 4.4 MB, mostly due to the long file listing under "distribution".
Croissant exacerbates the problem. The same dataset yields a Croissant file that is 7.1 MB. This a lot of extra weight for a dataset landing page.
Can you please suggest some best practices? What is a reasonable upper limit for a Croissant file that will go in the <head> of a page? When we reach the limit, what should we do? Only show a few files under "distribution"?
Again, I'm mostly talking about the content that goes into the <head> of a page. A 7.1 MB Croissant file is fine when it is downloaded separately from the dataset landing page, via API.
Thanks!
A dataset with thousands of files is unusual, and the size of the metadata being reported in this case is actually not that bad (DDI would be quite larger). Please understand that this is an issue.
This issue also arises in datasets with a very large number of variables or lengthy code lists. We're also discussing the addition of summary statistics, which would significantly impact the size.
So there are several relevant use cases here.
@pdurbin: Could you provide a copy of the schema.org and croissant files you are referring to (or link to the page)?
@kulnor sure, here you go (two versions of each: raw and pretty printed with jq):
- croissant-raw.json 7.0 MB
- croissant-pretty.json 8.8 MB
- jsonld-raw.json 4.4 MB
- jsonld-pretty.json 5.6 MB
It's a lot of payload in the HTML! Maybe we should consider Signposting. 😄
I'm glad to see the 1.1 label got added to this issue! 🎉 Thanks! ❤️