root icon indicating copy to clipboard operation
root copied to clipboard

Can not access REntry fields with DataVector in python

Open Nowakus opened this issue 6 months ago • 1 comments

Check duplicate issues.

  • [x] Checked for duplicates

Description

An error is reported when trying to access REntry fields of type related to DataVector (a class with a default template argument) in python:

entry["HLTNav_RepackedFeatures_MET"]      # field of type: DataVector<xAOD::TrigMissingET_v1>

 File "/cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev3/Tue/ROOT/HEAD/x86_64-el9-gcc13-opt/lib/ROOT/_pythonization/_rntuple.py", line 28, in _REntry_getitem
    ptr_proxy = self._CallGetPtr(key)
 File "/cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev3/Tue/ROOT/HEAD/x86_64-el9-gcc13-opt/lib/ROOT/_pythonization/_rntuple.py", line 24, in _REntry_CallGetPtr
    return self._GetPtr[fieldType](key)
TypeError: Could not find "GetPtr<DataVector<xAOD::TrigMissingET_v1>>" (set cppyy.set_debug() for C++ errors):
  Failed to instantiate "GetPtr<DataVector<xAOD::TrigMissingET_v1>>(::ROOT::RFieldToken&)"

Reproducer

# standard ATLAS Athena setup:
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source $ATLAS_LOCAL_ROOT_BASE/user/atlasLocalSetup.sh
asetup main--dev3LCG,latest,Athena
python

import ROOT
rnt = ROOT.RNTupleReader.Open("EventData", "/afs/cern.ch/user/m/mnowak/public/DAOD_PHYS.rntuple.pool.root")
entry = rnt.CreateEntry()
rnt.LoadEntry(1, entry)
entry["HLTNav_RepackedFeatures_MET"]

ROOT version

master

Installation method

LCG dev3

Operating system

Linux

Additional context

No response

Nowakus avatar Jun 12 '25 12:06 Nowakus

Just to expand on what @Nowakus wrote, the use case is to read all the fields, something along the lines:

reader = ROOT.RNTupleReader.Open(
    ROOT.TFile.Open(fileName).Get("MetaData")
)
entry = reader.CreateEntry()
reader.LoadEntry(0, entry)

for field in reader.GetDescriptor().GetTopLevelFields():
    myObj = entry[field.GetFieldName()]
    # extracting the data from myObj and writing to a dictionary

(or in other words, something what one can see via reader.Show(0) (there's always only one entry in such ntuple - it's for metadata), but it's not outputted to std::ostream (which is not captured by PyRoot, so needs calling e.g. subprocess.run(...)which is not very nice), but can be stored as dict).

When the loop above arrives at the DataVector field, an exception is thrown:

RException: Could not find "GetPtr<DataVector<xAOD::TrigMissingET_v1>>" (set cppyy.set_debug() for C++ errors):
  shared_ptr<DataVector<xAOD::TrigMissingET_v1> > ROOT::REntry::GetPtr(ROOT::RFieldToken token) =>
    RException: type mismatch for field HLTNav_RepackedFeatures_MET: DataVector<xAOD::TrigMissingET_v1> vs. DataVector<xAOD::TrigMissingET_v1,DataModel_detail::NoBase>
At:
  void ROOT::REntry::EnsureMatchingType(ROOT::RFieldToken) const [T = DataVector<xAOD::TrigMissingET_v1, DataModel_detail::NoBase>] [/build/jenkins/workspace/lcg_nightly_pipeline/build/projects/ROOT-HEAD/src/ROOT-HEAD-build/include/ROOT/REntry.hxx:140]

the same that can be reproduced using https://gitlab.cern.ch/maszyman/rntuple-atlas-datavector (thus the issue does not seem to be limited to python).

maszyman avatar Jun 12 '25 13:06 maszyman

Hi @vepadulano, @enirolf,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely, :robot:

github-actions[bot] avatar Jun 27 '25 06:06 github-actions[bot]

Hi @vepadulano, @enirolf,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely, :robot:

github-actions[bot] avatar Jun 30 '25 06:06 github-actions[bot]

Just to confirm https://github.com/root-project/root/pull/19087 fixes the ATLAS use case.

Thanks!

maszyman avatar Jun 30 '25 08:06 maszyman

My example is unfortunately still failing in exactly the same way as described. Using this build: | Welcome to ROOT 6.37.01 https://root.cern | | Built for linuxx8664gcc on Jun 27 2025, 22:56:06 | | From heads/master@v6-37-01-7199-g0ba8001c8c |

But also interesting enough, after running the loop from Maciek it will start to work:

for field in rnt.GetDescriptor().GetTopLevelFields(): ... myObj = entry[field.GetFieldName()] ... entry["HLTNav_RepackedFeatures_MET"] <cppyy.gbl.DataVectorxAOD::TrigMissingET_v1 object at 0xb0779f0 held by std::shared_ptr<DataVectorxAOD::TrigMissingET_v1 > at 0x2bd0caa0>

Nowakus avatar Jun 30 '25 10:06 Nowakus

@Nowakus can you remind me, are you defining the IsCollectionProxy type trait or have the using IsCollectionProxy = std::true_type; member type as described here: https://github.com/root-project/root/blob/504130023e6cc46d38d3a26ada908036ff1bc945/tree/ntuple/inc/ROOT/RField/RFieldProxiedCollection.hxx#L251-L264

edit: the Experimental is of course a mistake, let me fix that...

hahnjo avatar Jun 30 '25 10:06 hahnjo

As far as I can see the word "IsCollectionProxy" does not show up in our code anywhere.

I am not sure if this is the same, but we generate and install TGenCollectionProxy by hand for DataVectors - but that happens when dictionaries are loaded only.

Nowakus avatar Jun 30 '25 12:06 Nowakus

As far as I can see the word "IsCollectionProxy" does not show up in our code anywhere.

I am not sure if this is the same, but we generate and install TGenCollectionProxy by hand for DataVectors - but that happens when dictionaries are loaded only.

Ok, that's a problem for the typed API (which we are using from Python) because the compiler chooses the wrong class hierarchy and then you will get DataVector has an associated collection proxy; use RProxiedCollectionField instead (at least I assume that's what you are seeing?). We need the type traits so that it works correctly.

hahnjo avatar Jun 30 '25 13:06 hahnjo

The error is the one in the description at the very top

Nowakus avatar Jun 30 '25 13:06 Nowakus

Ok indeed there seem to be more problems; with cppyy.set_debug():

lookup.funcname.file:1:8: error: too few template arguments for class template 'DataVector'
GetPtr<DataVector<xAOD::TrigMissingET_v1>>
       ^
input_line_154:1:44: note: template is declared here
template <typename T, typename BASE> class DataVector;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       ^

@vepadulano

hahnjo avatar Jun 30 '25 13:06 hahnjo

Can reproduce locally, now the issue is localised to the Python side only, and it's a different issue than the one fixed by the merged PR.

vepadulano avatar Jun 30 '25 16:06 vepadulano

Hi @Nowakus ,

I have found a way to make your reproducer work, with what for now is a workaround. It looks like the issue stems from the way we try to instantiate the C++ template from the Python bindings. See the following working example run on lxplus within the ATHENA environment

import ROOT
rnt = ROOT.RNTupleReader.Open("EventData", "mnowak_file.root")
entry = rnt.CreateEntry()
rnt.LoadEntry(1, entry)
# This works
entry._GetPtr[ROOT.DataVector[ROOT.xAOD.TrigMissingET_v1]]("HLTNav_RepackedFeatures_MET")

Notice that I'm passing to template instantiation _GetPtr the Python proxy of the class type. It looks like at the moment the instantiation via the type name string is not doing the same amount of work. Currently investigating this.

vepadulano avatar Jul 01 '25 11:07 vepadulano

I see the same. Thanks for the update.

Nowakus avatar Jul 01 '25 12:07 Nowakus

Hi, another update. I think I can now explain why we're seeing this issue. Unfortunately, I cannot yet provide a solution.

Let's start with an important clarification. The Python bindings are designed to be lazy whenever possible. If a certain ROOT class/function/attribute is not requested via the Python bindings, it won't be loaded. In this particular scenario, the main difference between the following (class names taken from my local reproducer of the first part of this issue, now available as a test at https://github.com/root-project/root/blob/master/roottest/root/ntuple/atlas-datavector/AtlasLikeDataVector.hxx)

ROOT.foo["AtlasLikeDataVector<CustomStruct>"]
ROOT.foo[ROOT.AtlasLikeDataVector[ROOT.CustomStruct]]

Is that in the second case, by retrieving a Python proxy to the AtlasLikeDataVector and CustomStruct classes, we're actively asking ROOT to populate the related information of these classes in the typesystem. In the first case instead, the string doesn't immediately correspond to a request, so the loading of the class information is treated lazily.

Once the function is being tried for instantiation by cppyy, at some point it enters TemplateProxy::Instantiate which tries to instantiate the real C++ template for the function. In the case of the argument passed by string, this fails. The reason is that AtlasLikeDataVector<CustomStruct> was not autoloaded before, which instead happens with ROOT.AtlasLikeDataVector[ROOT.CustomStruct].

One idea I'm testing right now is to have TemplateProxy::Instantiate retry in case of first failure of Cppyy::GetMethodTemplate and actually load the class information before trying to instantiate the template. Keeping aside for a moment the fact that the string manipulation of a full function template signature is shaky at best, I've started investigating what happens when calling TClass::GetClass("AtlasLikeDataVector<CustomStruct>") right before the new call to GetMethodTemplate. To my surprise, TClass immediately finds AtlasLikeDataVector<CustomStruct> via TClassTable::GetDictNorm(name) and early-exits. Thus, no autoloading happens, but it should happen in this case to make it work.

So, somehow, the information about AtlasLikeDataVector<CustomStruct> was loaded in the typesystem, but only partially/inconclusively. This happens as follows.

Back in the first call of GetMethodTemplate inside of TemplateProxy::Instantiate there is a call to TCling::GetFunctionWithPrototype to get the function signature corresponding to the string "foo<AtlasLikeDataVector<CustomStruct>>". This eventually calls into TSystem::Load and somehow TCling::AutoLoad("CustomStruct"). That is, only the information about the innermost part of the template is AutoLoaded, not AtlasLikeDataVector. But, since the loading is happening inside of the same library generated from the dictionary source, the AtlasLikeDataVector class is still loaded and cached in the list of classes available to ROOT. This is how it is then immediately found by TClass::GetClass. I attach a full stacktrace of this part.

customstruct_autoload.txt

I thought about getting the real normalized name of AtlasLikeDataVector<CustomStruct> which should be AtlasLikeDataVector<CustomStruct, DataModel_detail::NoBase> via TClassEdit::GetNormalizedName. That fails for the same reason: the class is already available, so the name never gets really normalized, the function returns early.

@pcanal Somehow I think TClass::GetClass should be able to detect this situation, call AutoLoad for AtlasLikeDataVector<CustomStruct>, but I don't know yet how.

vepadulano avatar Jul 03 '25 11:07 vepadulano

Just to confirm that Marcin's example now works using dev3.

Thanks!

maszyman avatar Jul 10 '25 07:07 maszyman

Thank you for confirming and for your help with the reproducers! We can now close this issue

vepadulano avatar Jul 10 '25 07:07 vepadulano