uproot5
uproot5 copied to clipboard
`uproot.recreate` and `uproot.update` are using the colon-parsing of `uproot.open`, but they shouldn't
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.recreate("/tmp/a::b.root")
<WritableDirectory '/' at 0x7ceb9ec89350>
>>>
% ls /tmp/a*
/tmp/a
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.update("/tmp/a::b.root")
<WritableDirectory '/' at 0x759c5ff57ad0>
>>>
% ls /tmp/a*
/tmp/a
These should create and update a file named /tmp/a::b.root, with the colons in the filename. It might get even weirder if the colons are in a directory in the full path, rather than the final filename.
This isn't even fixed by a pathlib.Path. That's weird, because pathlib.Path is the way to turn off colon-parsing in uproot.open.
% rm /tmp/a*
rm: remove regular file '/tmp/a'? y
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> import uproot
>>> uproot.recreate(pathlib.Path("/tmp/a::b.root"))
<WritableDirectory '/' at 0x7e3acc793350>
>>>
% ls /tmp/a*
/tmp/a
I thought maybe it wasn't the colon-parsing code, but maybe URL-parsing (since files can now be written remotely). But no, that's not it:
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.recreate("file:///tmp/a::b.root")
<WritableDirectory '/' at 0x78f47e54e810>
>>>
% ls /tmp/a*
/tmp/a
This seems to come from fsspec, more precisely here. It seems to me that it happens because the code doesn't distinguish between the :: being part of the file name or protocol separator.
I tried changing the first lines to this
if "::" in path:
x = re.compile(".*[^a-z]+.*") # test for non protocol-like single word
bits = []
for p in path.split("::"):
# Check if part looks like a protocol or URL
if "://" in p or x.match(p) or p in known_implementations:
bits.append(p)
else:
# If not, assume it is part of the file name
bits.append(p + "://")
# If no part matches a known protocol, treat the entire path as a file name
if not any(b for b in bits if b.strip("://") in known_implementations):
bits = [path]
else:
bits = [path]
and your reproducer seems to work. I will open a PR in fsspec and get feedback from the maintainers.