opendata.cern.ch icon indicating copy to clipboard operation
opendata.cern.ch copied to clipboard

Record for simulated Open Data dataset for dedicated Jet Collection - 2020

Open artfisica opened this issue 3 years ago • 8 comments

Hi!

We would like to add a new record to the CERN Open Data portal.

This time is a dedicated simulated sample related to an educational project relative to jet reconstruction.

They are ten (10) ROOT files with a 2.1GB size each. As mention by other channels, the files are already in the ATLAS upload area.

We also mention that we will use the recently created record 15009 as a guide to creating the corresponding JSON file.

Thanks and cheers, Arturo

CERN-internal ref: https://its.cern.ch/jira/browse/MYATLAS-53

artfisica avatar Apr 14 '21 13:04 artfisica

Thanks, if you start preparing a new JSON file such as atlas-2020-exactly2lep.json for the qa branch, please use the following identifiers that I have just reserved:

  • record ID = 15010
  • DOI = 10.7483/OPENDATA.ATLAS.L806.5CKU

I'll copy-paste here also the file information once the files are transferred.

tiborsimko avatar Apr 14 '21 16:04 tiborsimko

Thank you so much, @tiborsimko,

I also changed the name of the files as you recommended, in the case you did not move them yet:

> ls /.../.../.../mc-jets-2020
mc_jets-2020.part01.root
mc_jets-2020.part02.root
mc_jets-2020.part03.root
mc_jets-2020.part04.root
mc_jets-2020.part05.root
mc_jets-2020.part06.root
mc_jets-2020.part07.root
mc_jets-2020.part08.root
mc_jets-2020.part09.root
mc_jets-2020.part10.root

I already have the JSON "ready" with the description and the details you mentioned here.

artfisica avatar Apr 20 '21 11:04 artfisica

Thanks, files have been moved to the production destination. Here is the file snippet to use in your JSON:

  "files": [
    {
      "checksum": "adler32:78551f76",
      "size": 2227206546,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part01.root"
    },
    {
      "checksum": "adler32:a8bb1ca5",
      "size": 2229449665,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part02.root"
    },
    {
      "checksum": "adler32:49b1c9f1",
      "size": 2225466679,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part03.root"
    },
    {
      "checksum": "adler32:b3d3aa49",
      "size": 2226584660,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part04.root"
    },
    {
      "checksum": "adler32:f9c7c88f",
      "size": 2225583813,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part05.root"
    },
    {
      "checksum": "adler32:30f3b981",
      "size": 2231674406,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part06.root"
    },
    {
      "checksum": "adler32:df76ad3c",
      "size": 2228512434,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part07.root"
    },
    {
      "checksum": "adler32:f7a2f3d6",
      "size": 2229537109,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part08.root"
    },
    {
      "checksum": "adler32:6d8b2470",
      "size": 2228550482,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part09.root"
    },
    {
      "checksum": "adler32:566c6b6d",
      "size": 2229025592,
      "uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part10.root"
    }
  ]

Also, in the distribution field, you can enter 10 files, total size 22281591386. (Dunno about the number of events.)

Is the file destination OK? Would you like me to change anything?

tiborsimko avatar May 07 '21 12:05 tiborsimko

Hi @tiborsimko, Thank you very much!

I added the details to the JSON file. Also adding the number of events, among other information. I think the distribution is OK.

Here is the JSON file in my fork of the repo: https://github.com/artfisica/opendata.cern.ch/blob/master/cernopendata/modules/fixtures/data/records/atlas-2020-mc-jet-reconstruction.json In case you would like to take a look before proceeding with PR.

Of course, any other comment or suggestion is very much welcome :)

Thanks again and excuses for the delay, Arturo

artfisica avatar May 26 '21 22:05 artfisica

Hi @artfisica, the record looks OK but please beware on line 84 there is ^L which breaks JSON compatibility. (You can check with jsonlint for example.) Please make the PR and I'll fix and merge. Thanks!

P.S. Also, would you like to tag the record with the "datascience" keyword so that it would appear in the Data Science search?

tiborsimko avatar Jun 01 '21 12:06 tiborsimko

Hi @tiborsimko,

Thank you very much! I just updated the file to repair the bad formatting and took the opportunity to change the text a bit. For example, using the word "million".

Regarding your suggestion, yes, please :) let's tag the record with "datascience" keyword. Can we add the "education" keyword also?

  • Related question: the keywords is something I should add to the JSON?

artfisica avatar Jun 01 '21 12:06 artfisica

Yes, you can add to the JSON file the following snippet:

    ...
    "keywords": [
      "datascience",
      "education"
    ],
    ...

tiborsimko avatar Jun 01 '21 12:06 tiborsimko

Hi @tiborsimko,

Thank you, I have added the part in the JSON file, created a dedicated branch and asked the PR: https://github.com/cernopendata/opendata.cern.ch/pull/3114

Hopefully this time I made it OK.

artfisica avatar Jun 01 '21 15:06 artfisica