opendata.cern.ch
opendata.cern.ch copied to clipboard
API: differentiate records with file_indexes
I'm playing around this endpoint:
https://opendata.cern.ch/api/records/{record-id}
Sometimes, files are actually lists that need to be "unpacked" to get the final paths (e.g. record "1") while others directly list the paths (e.g. record "5200") while the mentioned schema is the same one (http://opendata.cern.ch/schema/records/record-v1.0.0.json
).
Currently, our strategy to understand in which of two representation the record is exposed is based on the file names (if the file name ends with file_index.json
then unpack it and ignore the other txt
entries) but I guess another key in the JSON could expose this difference?
I am aware of the "type" key for the files, but this one is lacking for non index.txt or index.json entries.
Does that make sense or am I missing something? Thanks for any clarification.
I'm not sure if this answers your question, but at least for CMS, we have the collection field
-
primary datasets (i.e. collision data) search
"collections": [ "CMS-Primary-Datasets" ],
-
SImulated datasets, search
"collections": [ "CMS-Simulated-Datasets" ],
-
Derived datasets, i.e. the files derived from some of the datasets of the two categories above, search
"collections": [ "CMS-Derived-Datasets" ],
The last category is quite heterogeneous, and there's a connected issue on the file listings there in #2846
@tiborsimko can comment the best
I'm tempted to close this old ticket. The submitter seems to be happy with the comment from Kati. Any objections?