ldmx-sw icon indicating copy to clipboard operation
ldmx-sw copied to clipboard

Framework fails to copy basic-type branches to output file if they are observed

Open tomeichlersmith opened this issue 1 year ago • 3 comments

Describe the bug When writing an output file while reading an input file, the branches that are basic-type and being observed by the process are not copied into the output file properly. This is difficult to explain, so it is best to understand by looking through the example in To Reproduce.

To Reproduce I originally observed this with some actual simulation data I'm working with, but I condensed it into a smaller reproducible example here. I reference a few C++ source and python files that I've copied here since they are small. The TLDR on those files is that I produce two branches data and unobs which are both simple float branches. The "analysis" only looks at the data branch while creating a new file where both branches are copied into it. The resulting events file has a data branch that only has corrupted-looking data while the unobs branch has data matching what was originally produced.

Files for Reproducing

Produce.cxx

#include "Framework/EventProcessor.h"

#include <random>

class Produce : public framework::Producer {
  /// the random number generator, unseeded so it produces the same results each time
  std::mt19937 rng;
  /// the distribution of values in the Hits vector
  std::uniform_real_distribution<float> rand_float; 
 public:
  Produce(const std::string& name, framework::Process& p)
    : framework::Producer(name, p),
    rng{}, // this is where a seed for the RNG would be put
    rand_float{0.,100.}
  {}
  ~Produce() = default;
  void produce(framework::Event& event) final {
    float data = rand_float(rng);
    float unobs = rand_float(rng);
    event.add("data", data);
    event.add("unobs", unobs);
  }
};  // Produce

DECLARE_PRODUCER(Produce);

prod.py

from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('prod')
p.maxEvents = 10
p.outputFiles = [ 'events.root' ]
p.sequence = [ ldmxcfg.Producer.from_file('Produce.cxx') ]

Ana.cxx

#include "Framework/EventProcessor.h"

class Ana : public framework::Analyzer {
 public:
  Ana(const std::string& name, framework::Process& p)
    : framework::Analyzer(name, p) {}
  ~Ana() = default;
  void analyze(const framework::Event& event) final {
    const auto& data = event.getObject<float>("data");
    std::cout << data << std::endl;
  }
};  // Ana

DECLARE_ANALYZER(Ana);

ana.py

from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('ana')
p.inputFiles = [ 'events.root' ]
p.outputFiles = [ 'ana-events.root' ]
p.sequence = [ ldmxcfg.Analyzer.from_file('Ana.cxx') ]

check.py

import sys
import uproot
with uproot.open({sys.argv[1]:'LDMX_Events'}) as t:
    data = t['data_prod'].array(library='np')
    print(f'{data=}')
    unobs = t['unobs_prod'].array(library='np')
    print(f'{unobs=}')
  1. denv init ldmx/pro:v4.0.1
  2. denv fire prod.py
  3. denv fire ana.py
  4. Observe Error with check.py
tom@appa:~/code/ldmx/ana-basic-rewrite-bug$ denv python3 check.py events.root
data=array([81.47237 , 90.57919 , 12.698682, 91.337585, 63.235928,  9.75404 ,
       27.849823, 54.68815 , 95.75069 , 96.48885 ], dtype=float32)
unobs=array([13.547701, 83.500854, 96.88678 , 22.103405, 30.816704, 54.722057,
       18.838198, 99.28813 , 99.64613 , 96.76949 ], dtype=float32)
tom@appa:~/code/ldmx/ana-basic-rewrite-bug$ denv python3 check.py ana-events.root
data=array([2.6254541e+38, 1.6092322e-15, 1.6092322e-15, 1.6092322e-15,
       1.6092322e-15, 1.6092322e-15, 1.6092322e-15, 1.6092322e-15,
       1.6092322e-15, 1.6092322e-15], dtype=float32)
unobs=array([13.547701, 83.500854, 96.88678 , 22.103405, 30.816704, 54.722057,
       18.838198, 99.28813 , 99.64613 , 96.76949 ], dtype=float32)

Desired behavior I would really like for the Framework to be able to copy all types of branches whether or not they have been observed. While it sounds funny, being observed does trigger a different method of copying within the Framework since - if no processor needs a branch - we optimize the copy by letting ROOT copy the buffers directly instead of trying to load them into memory. This is not an issue with how the simple data types are being read as can be seen by the printouts within the "analysis" processor.

Environment: Output of denv config:

denv_workspace="/home/tom/code/ldmx/ana-basic-rewrite-bug"
denv_name="ana-basic-rewrite-bug"
denv_image="ldmx/pro:v4.0.1"
denv_mounts=""
denv_shell="/bin/bash -i"
denv_network="true"
Docker version 26.1.3, build b72abbb

tomeichlersmith avatar Jun 21 '24 19:06 tomeichlersmith

But isnt this a feature of uproot? like if you did an event loop in the check.py it should work for the ana-events.root, no?

tvami avatar Jun 21 '24 21:06 tvami

No, I originally found this issue with a subsequent ldmx-sw standalone processor and used uproot as a double check. (proof below).

To be very clear. This is only an issue with basic-type branches that are "root" branches and not the sub-branches that are generated when we create a branch with a more complicated parent class (e.g. std::vector<double> does not have this issue). That is why we haven't observed this before since most data is passed around in packages of these more complicated packages.

tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire prod.py 
---- LDMXSW: Loading configuration --------
Processor source file /home/tom/code/ldmx/1369-copy-obs-basic-types/Produce.cxx is newer than its compiled library /home/tom/code/ldmx/1369-copy-obs-basic-types/libProduce.so (or library does not exist), recompiling...
done compiling /home/tom/code/ldmx/1369-copy-obs-basic-types/Produce.cxx
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete  --------
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire ana.py 
---- LDMXSW: Loading configuration --------
Processor source file /home/tom/code/ldmx/1369-copy-obs-basic-types/Ana.cxx is newer than its compiled library /home/tom/code/ldmx/1369-copy-obs-basic-types/libAna.so (or library does not exist), recompiling...
done compiling /home/tom/code/ldmx/1369-copy-obs-basic-types/Ana.cxx
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
81.4724
90.5792
12.6987
91.3376
63.2359
9.75404
27.8498
54.6881
95.7507
96.4889
---- LDMXSW: Event processing complete  --------
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire reana.py 
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
-6.38457e+35
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
---- LDMXSW: Event processing complete  --------

where prod.py and ana.py (and their referenced C++ files) are the same as in the original description. reana.py is copied below for completely but is just running the same ana processor but over the copied events file.

from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('ana')
p.inputFiles = [ 'ana-events.root' ]
p.sequence = [ ldmxcfg.Analyzer.from_file('Ana.cxx') ]
Wrapping Basic Types Avoids this Issue

If we wrap the basic type in a more complicated class, we can get around this issue easily.

Additions

to Produce.cxx:produce

    std::vector<float> wrap{data};
    event.add("wrap", wrap);

to Ana.cxx:analyze

    std::cout << data << " "; // instead of std::endl
    const auto& wrap = event.getCollection<float>("wrap").at(0);
    std::cout << wrap << std::endl;

Printouts

Left side is unwrapped, right side is wrapped

tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire prod.py 
---- LDMXSW: Loading configuration --------
Processor source file /home/tom/code/ldmx/1369-copy-obs-basic-types/Produce.cxx is newer than its compiled library /home/tom/code/ldmx/1369-copy-obs-basic-types/libProduce.so (or library does not exist), recompiling...
done compiling /home/tom/code/ldmx/1369-copy-obs-basic-types/Produce.cxx
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete  --------
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire ana.py 
---- LDMXSW: Loading configuration --------
Processor source file /home/tom/code/ldmx/1369-copy-obs-basic-types/Ana.cxx is newer than its compiled library /home/tom/code/ldmx/1369-copy-obs-basic-types/libAna.so (or library does not exist), recompiling...
done compiling /home/tom/code/ldmx/1369-copy-obs-basic-types/Ana.cxx
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
81.4724 81.4724
90.5792 90.5792
12.6987 12.6987
91.3376 91.3376
63.2359 63.2359
9.75404 9.75404
27.8498 27.8498
54.6881 54.6881
95.7507 95.7507
96.4889 96.4889
---- LDMXSW: Event processing complete  --------
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire reana.py 
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
3.46836e+17 81.4724
3.46882e+17 90.5792
3.46882e+17 12.6987
3.46882e+17 91.3376
3.46882e+17 63.2359
3.46882e+17 9.75404
3.46882e+17 27.8498
3.46882e+17 54.6881
3.46882e+17 95.7507
3.46882e+17 96.4889
---- LDMXSW: Event processing complete  --------

tomeichlersmith avatar Jun 22 '24 17:06 tomeichlersmith

So, for any readers who didn't participate in writing the Bus which holds the event objects (i.e. everyone else), the handling of these in-memory objects is rather opaque. With this in mind, I'm going to elucidate a potential source of the issue.


ROOT does not expose a common interface for setting addresses of branches.[^1] I eventually found a solution of using TBranchElement::SetObject for all "higher level" branches and TBranch::SetAddress for the lower-level (BSILFD[^2]) branches. This is written into the attach method of the Bus::Passenger which uses argument-type-deduction to delegate to attachBasic for BSILFD types and attach for everything else. You'll notice that this is the exact separation we are observing. Once we wrap a float (for example) in a more complicated class the issue goes away.

This makes me want to investigate the attachBasic implementation to see if we need to modify it.

https://github.com/LDMX-Software/ldmx-sw/blob/ebbf27abb5b46398e8539c7ab02cfd9e77adcc69/Framework/include/Framework/Bus.h#L513-L530

Perhaps adding a branch->SetAutoDelete(false) like

https://github.com/LDMX-Software/ldmx-sw/blob/ebbf27abb5b46398e8539c7ab02cfd9e77adcc69/Framework/include/Framework/Bus.h#L475-L477

which is what is done in attach.

[^1]: Caveat: I could revisit this and look at using TBranch::SetAddress everywhere. At the time, I could not get that to work and so here we are.

[^2]: Bool Short Int Long Float Double

tomeichlersmith avatar Sep 26 '24 14:09 tomeichlersmith