[Proposal] Metadata file for experiments?
Hi,
I would like to ask some opinions whether it's a good idea to have a metadata (JSON) file for experiments (in root folder or in config folder). It could be like:
[
{
"exp": "s509",
"detectors":
{
"neuland":
{
"onspill_tpat": 1,
"offspill_tpat": 14,
"NumOfPlanes": 26,
}
}
},
{
"exp": "s522",
"detectors":
{
"neuland":
{
"onspill_tpat": 1,
"offspill_tpat": 11,
"NumOfPlanes": 26,
}
}
},
{
"exp": "s118",
"detectors":
{
"neuland":
{
"onspill_tpat": 1,
"offspill_tpat": 14,
"NumOfPlanes": 26,
}
}
},
]
The JSON file can then be read by CMake. When configure the cmake, experiment ID can be set with:
cmake .. -DR3BEXP="s118"
and the corresponding C++ global constants can be set accordingly during the compile time. We could also set the default value of R3BEXP to be the latest experiment if not specified.
These are my initial thought. Any suggestions and comments are welcomed.
Maybe set the expID at compile time is a bad idea. Doing so will bind all binary files of R3BRoot to a specific experiment. It is very bad both for executables and ROOT macros if user isn't aware of which experiment R3BRoot is compiled with. Yeah, so better read the metadata file during the run time.
This information should be loaded from a parameter container, something like R3BTpatMappingPar. But the number of planes for Neuland is already in the R3BNeulandMappingPar container. You could also include the parameters "offspill_tpat" and "onspill_tpat" in the latter.
Ok, but do you think it's a good idea to store it in JSON file (in R3BParam repo), which is human readable. The ROOT parameter file could then be generated from this file?
Yes if we can manage it with FairRoot, I am not sure. Here you can see the current format that we are using to store the positions of each detector: https://github.com/R3BRootGroup/R3BParams_s091_s118/blob/dev/parameters/CalibPar_522.par This is also human readable. Do you thing that JSON format is better?
Yes, JSON file is pretty much the standard file format for storing metadata. nlohmann/json has been already included in FairSoft. The problem for text based file(CalibPar_522.par) is human readable but not very readable for computers. For example, if someone needs to plot some data in CalibPar using python or whatever, he needs to write his own parser. But with JSON, since most of programming languages already implemented JSON parsers, all he needs is just use a already existing library.
But we could allow both.
No problem, but I prefer to use both; otherwise we will have to rewrite all the existing parameter files.
I have one question though. Why do we create R3BParams for each experiment instead of just one repository?
Hi @YanzhaoW, setup and parameters are experiment-dependent, as well as geometries and experiment-related macros, the typical contents of the parameter containers. Repositories can be created from a template and are self-consistent for those who want to analyse one experiment, as it is usually the case for our students.
Hi, @hapol
I see. But Jose Luis also mentioned we also have some metadata files in text format which specifies some geometrical and triggering information. To transform those metadata files to the parameter file in .root, I guess we are using some programs (or ROOT Macro)? If so, do we also copy those programs into each R3BParam repositories?
Yes, some files could (maybe) be the same on several repositories, maybe some are identical in all repositories... But this should be an exception. Even geometry parameter files could/should be experiment related. A "nominal" representation of the detectors it is included in R3BRoot, but parameters are in general particular of each experiment. Moving a set of parameters from a text representation to a root representation can be done via macros, using the first and second inputs (setFirstInput, setSecondInput) and the outputs (setOutput) of FairRuntimeDb. Examples of these macros are included in all detector directories of the macros repository, if I remember correctly. And also in the macros folder of the repositories, of course (I forgot to add and edit).
We read the file CalibPar_522.par directly with FairRuntimeDb and modify it by hand during the beam time to calibrate our detectors (especially for particle identification from FRS). Then we can create this file for different data runs (different values for the parameters) and read it with our macros depending on the RunID. For NeuLAND we work directly with Root files because the calibration of the TDCs requires a lot of information and it makes no sense to produce ASCII files.
I see. If I understand you correctly, the generation scripts of root file parameters from text files should be independent of each experiment. If so, it's also ok to put the generation scripts in R3BRoot (macros folder)?
For NeuLAND we work directly with Root files because the calibration of the TDCs requires a lot of information and it makes no sense to produce ASCII files.
If I'm not wrong, neuland TDC calibration only needs few values: number of planes, off spill tpat, coarse time clock frequency (so far always same), fine time channel numbers (so far also same). The trigger mapping can be inferred through the calibration process. So in reality just two numbers are needed. In this case, ASCII files would be a better option in my opinion.
The fine time calibration is an array with 1000 values, one value per bin: https://indico.gsi.de/event/5496/contributions/25455/attachments/18598/23317/FPGA_TDC.pdf For the calibration of some detectors, such as plastic scintillators at FRS where we only have 3 TDC signals, we store the parameters directly in ASCII files. But for other detectors like fibres, ToFD and NeuLAND the default is a Root file because there are many TDC channels.
I'm not sure about this. From the NeuLAND calibration process, the relation between the fine time value (in ns) and fine time channels (0~4086?) is determined during the calibration process which only need the input of lmd files. I'm not sure these parameters worth being saved in the repository as they are calculated anyway when we do calibration.
Yes, this depends on the electronics, but I think that NeuLAND also uses the class R3BTCalPar https://github.com/R3BRootGroup/R3BRoot/tree/dev/tcal, which manages the array with 4086 values. But maybe @igasparic can also comment about this point.
@jose-luis-rs is right. There is a huge amount of parameters for Neuland tcal calibration, roughly 500 pairs of numbers for each fine time channel, which is 2600 (pmts) x 2 (leading/trailing) + 169 (one trigger time channel for each TAMEX card)
Hi, @igasparic
Yes, that' right. But isn't NeuLAND tcal par calculated during the calibration process (actually we have a task class called R3BNeulandMapped2CalPar to do this job)? I think our workflow is to calculate the tcal par using this task with mapped data as the input files during the calibration, instead of downloading the tcal par from repository as the Mapped2CalPar. Please correct me if I'm wrong.
What kind of parameters should be saved into a repository instead IMO are those we can't calculate from R3BRoot tasks, something like number of double planes, off-spill trigger position and trigIDMapping.
I am not sure if I understand what you think. We run Mapped2CalPar class to produce calibration parameters and we store them into files. To produce calibrated data with Mapped2Cal class we read the calibration parameters from files and use them to obtain cal data from mapped data. How would you get cal data from mapped data without already existing calibration parameters?
We run Mapped2CalPar class to produce calibration parameters and we store them into files.
Yes, that's what I was saying. We create the tcal parameter file by running this task class Mapped2CalPar, instead of the downloading the tcal parameter file from a repository. Therefore, I would say we don't need to store such tcal par file in a repository because we don't download it anyway.
I still don't understand what is your point. Where should we keep the tcal parameters?
Hi, I think I understand what @YanzhaoW means, and the problem in his reasoning. There are two main problems with your approach @YanzhaoW:
- Obtaining each time the parameters without storing them it is time-consuming, as you are calling tasks for retrieving the parameters that are not part of the analysis of the data. Those clases are only to obtain the parameters, and once you have the parameters, you do not apply the classes anymore in the analysis flow.
- In many cases even the data that you use to obtain the parameters is different from the data that you want to analyse the experiment. Then, the experimental data is not used to obtain the parameters.
Does this two points answer the questions or I did not get the point?
I still don't understand what is your point. Where should we keep the tcal parameters?
I think we usually store them in a root file located somewhere in /lustre together with all calibrated data (like /lustre/r3b/202205_s509/NeuLAND_MapData/Parameters for s509). My point is it's not necessary upload it to some repos like R3BParams_s091_s118.
@hapol I see. Yeah, that could make things faster.
From the macros of NeuLAND calibration I saw, our current workflow is:
- run a macro that converts lmd to map level data. Input: lmd files. Output: map level data and Map2CalPar
- run a macro that converts map level to cal level. Input: map level data and Ma2CalPar. Output: cal level data and Cal2HitPar
- run a macro that converts cal level to hit level. Input: cal level data and Cal2HitPar. Output: hit level data.
We have been using this workflow in all previous experiments. I guess that's also the reason why neuland folders in R3BParams_s501 and R3BParams_s522 are empty.
But we could start to do what you suggest. Then workflow could be simpler:
- run a macro that converts lmd files to hit level data. Input: lmd files and all calibration parameters(downloaded from Github repository) Output: hit level data.