uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

`uproot.recreate` and `uproot.update` are using the colon-parsing of `uproot.open`, but they shouldn't

Open jpivarski opened this issue 1 year ago • 1 comments

% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.recreate("/tmp/a::b.root")
<WritableDirectory '/' at 0x7ceb9ec89350>
>>> 
% ls /tmp/a*
/tmp/a
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.update("/tmp/a::b.root")
<WritableDirectory '/' at 0x759c5ff57ad0>
>>> 
% ls /tmp/a*
/tmp/a

These should create and update a file named /tmp/a::b.root, with the colons in the filename. It might get even weirder if the colons are in a directory in the full path, rather than the final filename.


This isn't even fixed by a pathlib.Path. That's weird, because pathlib.Path is the way to turn off colon-parsing in uproot.open.

% rm /tmp/a*
rm: remove regular file '/tmp/a'? y
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> import uproot
>>> uproot.recreate(pathlib.Path("/tmp/a::b.root"))
<WritableDirectory '/' at 0x7e3acc793350>
>>> 
% ls /tmp/a*
/tmp/a

I thought maybe it wasn't the colon-parsing code, but maybe URL-parsing (since files can now be written remotely). But no, that's not it:

% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.recreate("file:///tmp/a::b.root")
<WritableDirectory '/' at 0x78f47e54e810>
>>> 
% ls /tmp/a*
/tmp/a

jpivarski avatar Jul 16 '24 16:07 jpivarski

This seems to come from fsspec, more precisely here. It seems to me that it happens because the code doesn't distinguish between the :: being part of the file name or protocol separator. I tried changing the first lines to this

    if "::" in path:
        x = re.compile(".*[^a-z]+.*")  # test for non protocol-like single word
        bits = []
        for p in path.split("::"):
            # Check if part looks like a protocol or URL
            if "://" in p or x.match(p) or p in known_implementations:
                bits.append(p)
            else:
                # If not, assume it is part of the file name
                bits.append(p + "://")
        
        # If no part matches a known protocol, treat the entire path as a file name
        if not any(b for b in bits if b.strip("://") in known_implementations):
            bits = [path]
    else:
        bits = [path]

and your reproducer seems to work. I will open a PR in fsspec and get feedback from the maintainers.

maxgalli avatar Jan 28 '25 11:01 maxgalli