opendata.cern.ch
opendata.cern.ch copied to clipboard
Record for simulated Open Data dataset for dedicated Jet Collection - 2020
Hi!
We would like to add a new record to the CERN Open Data portal.
This time is a dedicated simulated sample related to an educational project relative to jet reconstruction.
They are ten (10) ROOT files with a 2.1GB size each. As mention by other channels, the files are already in the ATLAS upload area.
We also mention that we will use the recently created record 15009 as a guide to creating the corresponding JSON file.
Thanks and cheers, Arturo
CERN-internal ref: https://its.cern.ch/jira/browse/MYATLAS-53
Thanks, if you start preparing a new JSON file such as atlas-2020-exactly2lep.json
for the qa
branch, please use the following identifiers that I have just reserved:
- record ID = 15010
- DOI = 10.7483/OPENDATA.ATLAS.L806.5CKU
I'll copy-paste here also the file information once the files are transferred.
Thank you so much, @tiborsimko,
I also changed the name of the files as you recommended, in the case you did not move them yet:
> ls /.../.../.../mc-jets-2020
mc_jets-2020.part01.root
mc_jets-2020.part02.root
mc_jets-2020.part03.root
mc_jets-2020.part04.root
mc_jets-2020.part05.root
mc_jets-2020.part06.root
mc_jets-2020.part07.root
mc_jets-2020.part08.root
mc_jets-2020.part09.root
mc_jets-2020.part10.root
I already have the JSON "ready" with the description and the details you mentioned here.
Thanks, files have been moved to the production destination. Here is the file snippet to use in your JSON:
"files": [
{
"checksum": "adler32:78551f76",
"size": 2227206546,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part01.root"
},
{
"checksum": "adler32:a8bb1ca5",
"size": 2229449665,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part02.root"
},
{
"checksum": "adler32:49b1c9f1",
"size": 2225466679,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part03.root"
},
{
"checksum": "adler32:b3d3aa49",
"size": 2226584660,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part04.root"
},
{
"checksum": "adler32:f9c7c88f",
"size": 2225583813,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part05.root"
},
{
"checksum": "adler32:30f3b981",
"size": 2231674406,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part06.root"
},
{
"checksum": "adler32:df76ad3c",
"size": 2228512434,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part07.root"
},
{
"checksum": "adler32:f7a2f3d6",
"size": 2229537109,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part08.root"
},
{
"checksum": "adler32:6d8b2470",
"size": 2228550482,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part09.root"
},
{
"checksum": "adler32:566c6b6d",
"size": 2229025592,
"uri": "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-05-26/JetRecoDataset/mc_jets-2020.part10.root"
}
]
Also, in the distribution
field, you can enter 10 files, total size 22281591386. (Dunno about the number of events.)
Is the file destination OK? Would you like me to change anything?
Hi @tiborsimko, Thank you very much!
I added the details to the JSON file. Also adding the number of events, among other information. I think the distribution is OK.
Here is the JSON file in my fork of the repo: https://github.com/artfisica/opendata.cern.ch/blob/master/cernopendata/modules/fixtures/data/records/atlas-2020-mc-jet-reconstruction.json In case you would like to take a look before proceeding with PR.
Of course, any other comment or suggestion is very much welcome :)
Thanks again and excuses for the delay, Arturo
Hi @artfisica, the record looks OK but please beware on line 84 there is ^L
which breaks JSON compatibility. (You can check with jsonlint
for example.) Please make the PR and I'll fix and merge. Thanks!
P.S. Also, would you like to tag the record with the "datascience" keyword so that it would appear in the Data Science search?
Hi @tiborsimko,
Thank you very much! I just updated the file to repair the bad formatting and took the opportunity to change the text a bit. For example, using the word "million".
Regarding your suggestion, yes, please :) let's tag the record with "datascience" keyword. Can we add the "education" keyword also?
- Related question: the keywords is something I should add to the JSON?
Yes, you can add to the JSON file the following snippet:
...
"keywords": [
"datascience",
"education"
],
...
Hi @tiborsimko,
Thank you, I have added the part in the JSON file, created a dedicated branch and asked the PR: https://github.com/cernopendata/opendata.cern.ch/pull/3114
Hopefully this time I made it OK.