opendata.cern.ch icon indicating copy to clipboard operation
opendata.cern.ch copied to clipboard

API: differentiate records with file_indexes

Open avivace opened this issue 3 years ago • 2 comments

I'm playing around this endpoint:

https://opendata.cern.ch/api/records/{record-id}

Sometimes, files are actually lists that need to be "unpacked" to get the final paths (e.g. record "1") while others directly list the paths (e.g. record "5200") while the mentioned schema is the same one (http://opendata.cern.ch/schema/records/record-v1.0.0.json).

Currently, our strategy to understand in which of two representation the record is exposed is based on the file names (if the file name ends with file_index.json then unpack it and ignore the other txt entries) but I guess another key in the JSON could expose this difference?

I am aware of the "type" key for the files, but this one is lacking for non index.txt or index.json entries.

Does that make sense or am I missing something? Thanks for any clarification.

avivace avatar Jun 17 '21 13:06 avivace

I'm not sure if this answers your question, but at least for CMS, we have the collection field

  • primary datasets (i.e. collision data) search

        "collections": [
          "CMS-Primary-Datasets"
        ], 
    
  • SImulated datasets, search

        "collections": [
          "CMS-Simulated-Datasets"
        ], 
    
  • Derived datasets, i.e. the files derived from some of the datasets of the two categories above, search

        "collections": [
          "CMS-Derived-Datasets"
        ], 
    

The last category is quite heterogeneous, and there's a connected issue on the file listings there in #2846

@tiborsimko can comment the best

katilp avatar Jun 22 '21 09:06 katilp

I'm tempted to close this old ticket. The submitter seems to be happy with the comment from Kati. Any objections?

psaiz avatar Apr 18 '24 08:04 psaiz