[WIP] NeuLAND experimental feature branch
NeuLAND experimental feature branch
Documentation: https://yanzhaow.github.io/R3BRoot/neuland.html
Github: https://github.com/YanzhaoW/R3BRoot
Dockerhub: https://hub.docker.com/r/yanzhaowang/r3bdev/tags
List of features
- NeuLAND executable with JSON configuration
- New parameter class
R3BParRootFileIo, which ignores the Run ID conflict and merges parameter files without creating temp par file - Runtime build of NeuLAND geometry without geometry file
- Deploying NeuLAND documentation with Doxygen
- Implementing MPI for HPC cluster tasks
- ~~Github CD to automatically build R3BRoot Apptainer container and push to sylabs~~
- Github CD to automatically build FairRoot and R3BRoot container and push to Dockerhub
- Python interface classes for data analysis with PyRoot.
- Simulation of Cal level data for input/output verification of NeuLAND calibration algorithms
- Implementation of point level data filter during the simulation
- Millepede algorithm for parameter fine tuning of NeuLAND cal2hit parameters
- More unit tests and integration
- More documentations
TODO list
- Complete documentation of NeuLAND simulation and calibration
- Disable data written during the calibration to make process faster
- Lua configuration (maybe)
- Complete tsync parameter fine tuning and coarse evaluation of energy related parameters.
- Add test coverage for NeuLAND code
- Achieve 90% plus test coverage for NeuLAND code (except legacy code).
FQA
What is this branch?
This is my feature branch that has many experimental features, which hopefully will be merged into the dev branch. Each new committed added passes all CI tests and static analyzers, like clang-tidy and cpp-linter.
Is this branch complete divergent from the dev branch?
No, I will continuously rebase my branch onto the latest commit of the master branch. For the dev branch, I will also try to rebase onto the latest, except when dev branch merges lots of new commits during the beamtime period. If this happens, I will wait a month or two.
How to use this branch?
git clone -b edwin_dev https://github.com/YanzhaoW/R3BRoot.git
Checklist:
- [x] Rebased against
devbranch - [x] My name is in the resp. CONTRIBUTORS/AUTHORS file
- [x] Followed the pull request guidlines and the Git workflow
- [x] Followed the seven rules of great commit messages
Hi @YanzhaoW
Be carefully with this point:
New parameter class R3BParRootFileIo, which ignores the Run ID conflict and merges parameter files without creating temp par file
We cannot ignore the Run-ID, as it is used to change the parameter settings in-flight. The idea is to calibrate the detectors for each setting, obtain the corresponding parameters for each detector, and then merge all of them into a single ROOT file, where each setting is identified by a Run-ID or Par-ID. The latter will be associated with a master timestamp range, which defines the valid range of use for the parameters.
In each experiment, we have (or should have!) someone responsible for the parameter repository, who manages the Par-ID for each setting. If this task is done properly, there's no reason to ignore it.
Hi, @jose-luis-rs
Thanks for your feedback. The implementation now is the run IDs are scanned in the beginning and the parameter with the same run ID will be chosen first. I could add a flag to force the compliance of Run ID of the parameters, such that if the same run ID doesn't exist, the program will abort.
The idea is to calibrate the detectors for each setting, obtain the corresponding parameters for each detector, and then merge all of them into a single ROOT file, where each setting is identified by a Run-ID or Par-ID.
First, I'm not sure the purpose of merging parameters of different detectors into a single file. In case of NeuLAND (maybe there are other detectors that are different), the final output, which can be used in higher level data analysis is just event data, containing all kinds of neutron and hit information. Here the calibration parameters are only used in the intermediate level and useless in the higher level.
Second, merging parameters of different run IDs in a single ROOT file seems also confusing to me. If the end goal is to use HPC clusters to do the calibrations, the parameter output from each node must be written to a different file. Thus, we would have just one parameter in each file, which is generated/used by data files with a certain range of run IDs.
In each experiment, we have (or should have!) someone responsible for the parameter repository, who manages the Par-ID for each setting. If this task is done properly, there's no reason to ignore it.
I don't quite understand the necessity of parameter repository. Like I said, mapped data can be immediately calibrated in HPC clusters once they are available after the beam time. And we will have hit level data ready for the higher level data analysis.
Hello @YanzhaoW
Thanks for your feedback. The implementation now is the run IDs are scanned in the beginning and the parameter with the same run ID will be chosen first. I could add a flag to force the compliance of Run ID of the parameters, such that if the same run ID doesn't exist, the program will abort.
This point might be relevant for certain experiments, but it doesn't reflect the general performance of our setups. For example, in the last experiment, we collected data for different incoming beams such as ²³F and ²⁵F. In the case of ²⁵F, the run IDs ranged from 151 to 161, but the par-ID remained the same because the detectors were very stable.
Now the code is confusing because from our macros we have to pass the run-Id to R3BUcesbSource and R3BFileSource via FairRunOnline and FairRunAna, respectively, using:
auto run = new FairRunOnline();
run->SetRunId(fRunId);
and this run-Id is later used to select the calibration parameters, but this is somewhat confusing because it actually represents the par-Id. This is something we need to eventually revise and rename properly in our classes.
In R3BFileSource you can find a line like this:
fRunId = GetRunid(fEvtHeader->GetTimeStamp());
Here, the par-Id (sorry for the incorrect name above!) is defined based on the timestamp. This approach of assigning a par-Id as a function of the timestamp allows for automatic selection of calibration parameters, especially if we have a single ROOT file containing all of them. It also simplifies the code needed in the analysis macros to read the calibration parameters for all the detectors, see for example https://github.com/R3BRootGroup/R3BParams_g249/blob/dev/macros/exp/unpack/unpack_data_all_levels.C, because we only need to read one file.
First, I'm not sure the purpose of merging parameters of different detectors into a single file. In case of NeuLAND (maybe there are other detectors that are different), the final output, which can be used in higher level data analysis is just event data, containing all kinds of neutron and hit information. Here the calibration parameters are only used in the intermediate level and useless in the higher level.
See my comments above. The macro "unpack_data_all_levels.C" is used to produce hit-level data for all detectors. Currently, it reads many input files containing calibration parameters. This is just a first version, as we are still working on the calibration parameters for all settings. Later, we will be able to merge all the parameters into a single ROOT file and then use the previous procedure based on timestamps to manage the parameters.
Second, merging parameters of different run IDs in a single ROOT file seems also confusing to me. If the end goal is to use HPC clusters to do the calibrations, the parameter output from each node must be written to a different file. Thus, we would have just one parameter in each file, which is generated/used by data files with a certain range of run IDs.
I'm not sure what you mean here. For calibrations (i.e., obtaining the parameters), we can use regular PCs, and sometimes we need to extract the parameters manually by performing fits. I believe the idea behind using the HPC cluster is simply to generate high-level data for all detectors in a short time. In this process we will not produce any new parameters.
I don't quite understand the necessity of parameter repository. Like I said, mapped data can be immediately calibrated in HPC clusters once they are available after the beam time. And we will have hit level data ready for the higher level data analysis.
This depends on the detector. For NeuLAND, I'm sure you can run your calibration macros on the HPC cluster, but for other detectors we might need to do it manually. This isn't a problem, as most calibrations can be done using normal PCs, for example I'm using my laptop for the FRS-PID calibrations, and it's sufficient.
"parameter repository" -> R3BParams_XXX repositories are needed to store all the parameters and analysis macros to have them under control, including the information about how the parameters were obtained.
Yeah, the actual meanings of "par-ID" and "run-ID" are quite confusing.
For the clarification, when we run FariRunAna, it has three kinds of run IDs:
- run ID stored in
FairEventHeaderfrom data files, which are managed byFairSource - run ID stored in
FairRtdbRunfrom parameter files, which are managed byFairRuntimeDb - run ID specified by the user
Now the logic is (if using R3BFileSource2, a.k.a, R3B::FileSource):
- if the run ID specified by the user is different from the run ID in the data file, the run ID in the data file will be used.
- if the run ID in the data file is different from the run ID given in the parameter file, the parameters will not be used and we will have an error.
The so-called "par ID", if I'm not wrong, is the run ID given by the FairRtdbRun in the parameter file.
My opinion is that this run ID from a parameter file is useless and only causes trouble for the data analysis, due to the following reasons:
- users often use one parameter file for different runs. This is, by default, not doable. Thus we can see many people manually change the run ID from the parameter file to the same value used in the
FairRun. - Even if users need to generate different parameters for different runs, they will the put each different parameter in a different file, due to the limitation that one run, which is executed by a node in HPC, must have its own individual output files (data + parameter). And for this, run ID in the parameter file is useless.
- If users need to put many parameters in a single file. maybe it's simpler just to store the parameters in a map?
class Parameters
{
public:
auto get_parameter(int run_id) -> const Parameter&;
private:
std::map<int, Parameter> parameters;
};
Hello @YanzhaoW
I will prepare an example to illustrate how it will work. But it might take a few days, as I am about to start my teaching duties.
The so-called "par ID", if I'm not wrong, is the run ID given by the FairRtdbRun in the parameter file.
Yes, you are right. But let me prepare the example to illustrate how it works.