root
root copied to clipboard
Unable to use EOS tokens with RDataFrame since 6.32
Check duplicate issues.
- [X] Checked for duplicates
Description
EOS tokens no longer work with RDataFrame in 6.32.04. In 6.30.08 everything is fine:
$ python3
Python 3.9.18 (main, Aug 23 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> url = 'root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root?xrd.wantprot=unix&authz=' + open("token.txt").read().strip()
>>> ROOT.TFile.Open(url).ls()
TNetXNGFile** root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root Demo ROOT file with histograms
TNetXNGFile* root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root Demo ROOT file with histograms
KEY: TH1F hpx;1 This is the px distribution
KEY: TH2F hpxpy;1 py vs px
KEY: TProfile hprof;1 Profile of pz versus px
KEY: TNtuple ntuple;1 Demo ntuple
>>> df = ROOT.RDataFrame("ntuple", url)
>>>
Reproducer
On lxplus:
$ source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.04/x86_64-almalinux9.4-gcc114-opt/bin/thisroot.sh
$ cp /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.04/x86_64-almalinux9.4-gcc114-opt/tutorials/hsimple.root /eos/user/c/cburr/hsimple.root
$ EOS_MGM_URL=root://eoshome-c.cern.ch eos token --path /eos/user/c/cburr/hsimple.root --permission=rx --expires=$(date +%s -d "30 minutes") > token.txt
$ kdestroy
$ python3
Python 3.9.18 (main, Aug 23 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> url = 'root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root?xrd.wantprot=unix&authz=' + open("token.txt").read().strip()
>>> ROOT.TFile.Open(url).ls()
TNetXNGFile** root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root Demo ROOT file with histograms
TNetXNGFile* root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root Demo ROOT file with histograms
KEY: TH1F hpx;1 This is the px distribution
KEY: TH2F hpxpy;1 py vs px
KEY: TProfile hprof;1 Profile of pz versus px
KEY: TNtuple ntuple;1 Demo ntuple
>>> df = ROOT.RDataFrame("ntuple", url)
Error in <TNetXNGSystem::GetDirEntry>: Unable to give access - user access restricted - unauthorized identity used ; Permission denied
*** Break *** segmentation violation
ROOT version
6.32.04
Installation method
sft.cern.ch
Operating system
Linux (lxplus)
Additional context
No response
Dear @chrisburr ,
Thank you for reaching out and for the reproducer. I am on it. Meanwhile, I just wanted to point out that for the first case in 6.30, just calling ROOT.RDataFrame will not attempt to open the file, whereas 6.32 opens the file at construction time ( to homogenise the way different data formats are processed). Just as a confirmation, could you try running any operation that would need to read data from the file in the first case with 6.30?
Thanks! This definitely used to be working (with 6.28 IIRC). If I find a minute I'll check with 6.30.
The problem is that RDF tries to open the file to check that it's valid. The logic for the file opening is at https://github.com/root-project/root/blob/962009b8c2057199c2229c3ef9938ac4d315d10a/tree/dataframe/src/RLoopManager.cxx#L1133 . In particular, because of the presence of the ? token, the string is parsed as a glob. Now in many cases that would be harmless albeit a tiny overhead (it would just return the same file name to open), but in this particular case it triggers a faulty behaviour. The glob parsing attempts at traversing the remote xrootd directory (see here), but since the permission is just for the single file with the token and not for the entire directory, it leads to the user access restricted error you post above.
Now, I believe the most sane course of action would be to refine the logic that checks whether the input file name is a glob. I could simply add a check for the xrd.wantprot token, but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob. This probably includes not only xrootd tokens but also anything https-related. Or we could adopt a different strategy for glob detection altogether. Thoughts @dpiparo @pcanal ?
Ah that makes sense. Extending the defintion of strings to add metadata to paths (globbing, the # syntax in TFile::Open, ...) is always going to be error prone.
but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob
This feels like an impossible task to define.
Maybe a simplier solution would be to not support ? when globbing and only apply globbing to the text before the query string? Or maybe just have a dedicated method (or argument type) for creating a RDataFrame from a glob rather than relying on huristics?
Maybe a simplier solution would be to not support ? when globbing and only apply globbing to the text before the query string? Or maybe just have a dedicated method (or argument type) for creating a RDataFrame from a glob rather than relying on huristics?
The first option was implemented in the linked PR, aligning the functionality to that of TChain::Add. For the second option, I agree that would be a good possibility, but requires extending the RDF interface so I didn't want to introduce it for a bugfix. This can be reassessed later on when necessary.