datachain
datachain copied to clipboard
File error in parallel mode
When running datachain query in parallel mode and there is an error with file (prefetch/download/cache), there is an error with exception pickle/unpickle:
File "/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/multiprocess/queues.py", line 138, in get_nowait
return self.get(False)
~~~~~~~~^^^^^^^
File "/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/multiprocess/queues.py", line 125, in get
return _ForkingPickler.loads(res)
~~~~~~~~~~~~~~~~~~~~~^^^^^
File "/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/dill/_dill.py", line 303, in loads
return load(file, ignore, **kwds)
File "/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/dill/_dill.py", line 289, in load
return Unpickler(file, ignore=ignore, **kwds).load()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/dill/_dill.py", line 444, in load
obj = StockUnpickler.load(self)
TypeError: FileError.__init__() missing 1 required positional argument: 'message'
How to reproduce:
import datachain as dc
def process_file(file: dc.File) -> dc.File:
file.path = "."
return file
def process_path(file2: dc.File) -> int:
print(file2.path)
return len(file2.path)
(
dc.read_storage("s3://bucket/")
.limit(10)
.settings(prefetch=0)
.map(file2=process_file)
.settings(prefetch=1, parallel=2)
.map(path_len=process_path)
.save("test")
)
Note: in distributed mode (SaaS related) there are no logs, job being stuck.
Update: in CLI in parallel mode sometimes job is being stuck too without no logs or errors.
In https://github.com/iterative/datachain/pull/1126 unpickling error was fixed, but it still being stuck in parallel mode and SaaS.
@dreadatour is this fixed? feel free to reopen?