uproot5
uproot5 copied to clipboard
`uproot` cannot write root files over xrootd.
To reproduce:
import uproot
file = uproot.recreate("root://cmseos.fnal.gov//store/user/ikrommyd/test/dummy.root")
file["tree"] = {"branch": [1,2,3]}
file.close()
fails with:
OSError Traceback (most recent call last)
Cell In[2], line 1
----> 1 uproot.recreate("root://cmseos.fnal.gov//store/user/ikrommyd/test/dummy.root")
File ~/miniforge3/envs/work/lib/python3.11/site-packages/uproot/writing/writable.py:110, in recreate(file_path, **options)
106 file_path = uproot._util.regularize_path(file_path)
107 storage_options = {
108 key: value for key, value in options.items() if key not in recreate.defaults
109 }
--> 110 sink = uproot.sink.file.FileSink(file_path, **storage_options)
111 compression = options.pop("compression", create.defaults["compression"])
113 initial_directory_bytes = options.pop(
114 "initial_directory_bytes", create.defaults["initial_directory_bytes"]
115 )
File ~/miniforge3/envs/work/lib/python3.11/site-packages/uproot/sink/file.py:53, in FileSink.__init__(self, urlpath_or_file_like, **storage_options)
51 else:
52 if not self._file_exists(urlpath_or_file_like, **storage_options):
---> 53 self._truncate_file(urlpath_or_file_like, **storage_options)
55 self._open_file = fsspec.open(
56 urlpath_or_file_like, mode="r+b", **storage_options
57 )
File ~/miniforge3/envs/work/lib/python3.11/site-packages/uproot/sink/file.py:81, in FileSink._truncate_file(cls, urlpath, **storage_options)
79 parent_directory = fs.sep.join(local_path.split(fs.sep)[:-1])
80 fs.mkdirs(parent_directory, exist_ok=True)
---> 81 fs.touch(local_path, truncate=True)
File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
115 @functools.wraps(func)
116 def wrapper(*args, **kwargs):
117 self = obj or args[0]
--> 118 return sync(self.loop, func, *args, **kwargs)
File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
101 raise FSTimeoutError from return_result
102 elif isinstance(return_result, BaseException):
--> 103 raise return_result
104 else:
105 return return_result
File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
54 coro = asyncio.wait_for(coro, timeout=timeout)
55 try:
---> 56 result[0] = await coro
57 except Exception as ex:
58 result[0] = ex
File ~/miniforge3/envs/work/lib/python3.11/site-packages/fsspec_xrootd/xrootd.py:380, in XRootDFileSystem._touch(self, path, truncate, **kwargs)
376 status, _ = await _async_wrap(self._myclient.truncate)(
377 path, size=0, timeout=self.timeout
378 )
379 if not status.ok:
--> 380 raise OSError(f"File not touched properly: {status.message}")
381 else:
382 len = await self._info(path)
OSError: File not touched properly: [ERROR] Server responded with an error: [3013] Unable to truncate (null); Operation not supported
The below reproducers work fine locally but will fail if "dummy" becomes a remote xrootd path.
import awkward as ak
import dask_awkward as dak
import uproot
# Eager
array = ak.Array({"x":[1,2,3], "y":[4,5,6]})
file = uproot.recreate("dummy.root")
file["tree"] = array
# Lazy
x = dak.from_awkward(array, 3)
uproot.dask_write(x, "dummy", prefix="whatever", compute=True)
@jpivarski when you find some time, please look into this as well.
Also mentioning @martindurant cause I don't know if fsspec is relevant to this.
The exception seems to be coming from xrootd, not fsspec. I can't say why the file didn't truncate.
@amadio gave us an alternative in https://github.com/xrootd/xrootd/issues/2304#issuecomment-2273394816, which can replace
https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L70-L81
(which calls truncate in fsspec-xrootd). It is only ever called from
https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L52-L53
@amadio's alternative involves opening the file with a different set of flags.
@maxgalli, do you think that can be implemented in Uproot, or does that fix need to go into fsspec-xrootd? I was at first thinking that we could do it, but after looking at the code references, it doesn't look like we have direct access to the XRootD.client.File object (anymore, now that we've adopted fsspec).
@amadio gave us an alternative in xrootd/xrootd#2304 (comment), which can replace
https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L70-L81
(which calls truncate in fsspec-xrootd). It is only ever called from
https://github.com/scikit-hep/uproot5/blob/aa8b94f722982c6eb418f9f32273152039836fb2/src/uproot/sink/file.py#L52-L53
@amadio's alternative involves opening the file with a different set of flags.
@maxgalli, do you think that can be implemented in Uproot, or does that fix need to go into fsspec-xrootd? I was at first thinking that we could do it, but after looking at the code references, it doesn't look like we have direct access to the
XRootD.client.Fileobject (anymore, now that we've adopted fsspec).
Hi @jpivarski, sorry for the late reply. It seems to me that implementing Guillerme's suggestion into fsspec-xrootd might be the way to go, but I'm not sure as I'm not familiar with the code yet. I will look into this in more detail in a few weeks from now, after thesis submission (I would thus keep the issue open for now if it is ok).
I'm confused, why do we need to have truncate=True in the touch command if it is only called when the file doesn't exist?
Since you later want to open a file with truncate on open, I would suggest to use that interface in fsspec-xrootd. Open the file with mode w once to touch/create/truncate it, then open it for r+b later.
Furthermore, I suspect interacting in read-write mode with a remote filesystem is not a great idea. We should encourage users to add the simplecache:: (see fsspec docs) prefix to the destination so a local temporary file can be opened, edited and finalized, before it is written once to the remote server on close.
Hi Nick, I came back to this, doing some investigation and trying to follow your suggestion. However, there are a few things I don't understand.
If I try to first touch the file and then re-open it in r+b mode, I get a raise NotImplementedError("File mode not supported") error rising from here. This makes sense, as r+b is not considered in this set.
Moreover, I would have naively expected r+b to be assigned OpenFlags.UPDATE in these lines, but instead it seems to fall into the last category. Is this behavior intended?
Ah, this might be a mistake, perhaps we can allow a mode? Though, with a we can only append.
I'm not sure what the comment update "+" mode removed for now since seek() is read only refers to. Does xrootd support rw seek?
Did you try the simplecache::? Then we would only write to the remote storage once.
Ah, this might be a mistake, perhaps we can allow
amode? Though, withawe can only append. I'm not sure what the commentupdate "+" mode removed for now since seek() is read onlyrefers to. Does xrootd support rw seek?
I think the comment might refer to this: the inherited seek function raises an error if not in rb mode. However, I thought it could maybe be implemented in fsspec-xrootd, in order to change the behavior?
As for xrootd, I looked around and it should support rw seek.
Did you try the
simplecache::? Then we would only write to the remote storage once.
Not yet, I'm still trying to find a combination of changes in uproot/fsspec/fsspec-xrootd that fixes the bug - will add it here.