uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

`uproot` cannot write root files over xrootd.

Open ikrommyd opened this issue 1 year ago • 4 comments

To reproduce:

import uproot

file = uproot.recreate("root://cmseos.fnal.gov//store/user/ikrommyd/test/dummy.root")
file["tree"] = {"branch": [1,2,3]}
file.close()

fails with:

OSError                                   Traceback (most recent call last)
Cell In[2], line 1
----> 1 uproot.recreate("root://cmseos.fnal.gov//store/user/ikrommyd/test/dummy.root")

File ~/miniforge3/envs/work/lib/python3.11/site-packages/uproot/writing/writable.py:110, in recreate(file_path, **options)
    106 file_path = uproot._util.regularize_path(file_path)
    107 storage_options = {
    108     key: value for key, value in options.items() if key not in recreate.defaults
    109 }
--> 110 sink = uproot.sink.file.FileSink(file_path, **storage_options)
    111 compression = options.pop("compression", create.defaults["compression"])
    113 initial_directory_bytes = options.pop(
    114     "initial_directory_bytes", create.defaults["initial_directory_bytes"]
    115 )

File ~/miniforge3/envs/work/lib/python3.11/site-packages/uproot/sink/file.py:53, in FileSink.__init__(self, urlpath_or_file_like, **storage_options)
     51 else:
     52     if not self._file_exists(urlpath_or_file_like, **storage_options):
---> 53         self._truncate_file(urlpath_or_file_like, **storage_options)
     55     self._open_file = fsspec.open(
     56         urlpath_or_file_like, mode="r+b", **storage_options
     57     )

File ~/miniforge3/envs/work/lib/python3.11/site-packages/uproot/sink/file.py:81, in FileSink._truncate_file(cls, urlpath, **storage_options)
     79 parent_directory = fs.sep.join(local_path.split(fs.sep)[:-1])
     80 fs.mkdirs(parent_directory, exist_ok=True)
---> 81 fs.touch(local_path, truncate=True)

File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec_xrootd/xrootd.py:380, in XRootDFileSystem._touch(self, path, truncate, **kwargs)
    376     status, _ = await _async_wrap(self._myclient.truncate)(
    377         path, size=0, timeout=self.timeout
    378     )
    379     if not status.ok:
--> 380         raise OSError(f"File not touched properly: {status.message}")
    381 else:
    382     len = await self._info(path)

OSError: File not touched properly: [ERROR] Server responded with an error: [3013] Unable to truncate (null); Operation not supported

The below reproducers work fine locally but will fail if "dummy" becomes a remote xrootd path.

import awkward as ak
import dask_awkward as dak
import uproot

# Eager
array = ak.Array({"x":[1,2,3], "y":[4,5,6]})
file = uproot.recreate("dummy.root")
file["tree"] = array

# Lazy
x = dak.from_awkward(array, 3)
uproot.dask_write(x, "dummy", prefix="whatever", compute=True)

ikrommyd avatar Jul 19 '24 13:07 ikrommyd

@jpivarski when you find some time, please look into this as well. Also mentioning @martindurant cause I don't know if fsspec is relevant to this.

ikrommyd avatar Jul 29 '24 21:07 ikrommyd

The exception seems to be coming from xrootd, not fsspec. I can't say why the file didn't truncate.

martindurant avatar Jul 30 '24 01:07 martindurant

@amadio gave us an alternative in https://github.com/xrootd/xrootd/issues/2304#issuecomment-2273394816, which can replace

https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L70-L81

(which calls truncate in fsspec-xrootd). It is only ever called from

https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L52-L53

@amadio's alternative involves opening the file with a different set of flags.

@maxgalli, do you think that can be implemented in Uproot, or does that fix need to go into fsspec-xrootd? I was at first thinking that we could do it, but after looking at the code references, it doesn't look like we have direct access to the XRootD.client.File object (anymore, now that we've adopted fsspec).

jpivarski avatar Aug 15 '24 16:08 jpivarski

@amadio gave us an alternative in xrootd/xrootd#2304 (comment), which can replace

https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L70-L81

(which calls truncate in fsspec-xrootd). It is only ever called from

https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L52-L53

@amadio's alternative involves opening the file with a different set of flags.

@maxgalli, do you think that can be implemented in Uproot, or does that fix need to go into fsspec-xrootd? I was at first thinking that we could do it, but after looking at the code references, it doesn't look like we have direct access to the XRootD.client.File object (anymore, now that we've adopted fsspec).

Hi @jpivarski, sorry for the late reply. It seems to me that implementing Guillerme's suggestion into fsspec-xrootd might be the way to go, but I'm not sure as I'm not familiar with the code yet. I will look into this in more detail in a few weeks from now, after thesis submission (I would thus keep the issue open for now if it is ok).

maxgalli avatar Aug 16 '24 09:08 maxgalli

I'm confused, why do we need to have truncate=True in the touch command if it is only called when the file doesn't exist?

nsmith- avatar Dec 19 '24 19:12 nsmith-

Since you later want to open a file with truncate on open, I would suggest to use that interface in fsspec-xrootd. Open the file with mode w once to touch/create/truncate it, then open it for r+b later.

nsmith- avatar Dec 19 '24 19:12 nsmith-

Furthermore, I suspect interacting in read-write mode with a remote filesystem is not a great idea. We should encourage users to add the simplecache:: (see fsspec docs) prefix to the destination so a local temporary file can be opened, edited and finalized, before it is written once to the remote server on close.

nsmith- avatar Dec 19 '24 19:12 nsmith-

Hi Nick, I came back to this, doing some investigation and trying to follow your suggestion. However, there are a few things I don't understand. If I try to first touch the file and then re-open it in r+b mode, I get a raise NotImplementedError("File mode not supported") error rising from here. This makes sense, as r+b is not considered in this set. Moreover, I would have naively expected r+b to be assigned OpenFlags.UPDATE in these lines, but instead it seems to fall into the last category. Is this behavior intended?

maxgalli avatar Jan 22 '25 15:01 maxgalli

Ah, this might be a mistake, perhaps we can allow a mode? Though, with a we can only append. I'm not sure what the comment update "+" mode removed for now since seek() is read only refers to. Does xrootd support rw seek?

Did you try the simplecache::? Then we would only write to the remote storage once.

nsmith- avatar Jan 22 '25 15:01 nsmith-

Ah, this might be a mistake, perhaps we can allow a mode? Though, with a we can only append. I'm not sure what the comment update "+" mode removed for now since seek() is read only refers to. Does xrootd support rw seek?

I think the comment might refer to this: the inherited seek function raises an error if not in rb mode. However, I thought it could maybe be implemented in fsspec-xrootd, in order to change the behavior? As for xrootd, I looked around and it should support rw seek.

Did you try the simplecache::? Then we would only write to the remote storage once.

Not yet, I'm still trying to find a combination of changes in uproot/fsspec/fsspec-xrootd that fixes the bug - will add it here.

maxgalli avatar Jan 22 '25 17:01 maxgalli