ir_datasets
ir_datasets copied to clipboard
Lock for writing to the cache files
I sometimes have multiple processes or multiple machines accessing the same storage cluster that hosts the cache directory of ir_datasets. If multiple processes decide to download the same dataset at the same time, they start writing to the same file and eventually crash.
It would be nice if there is a locking mechanism that prevents more than one process from writing to the same file and asking other processes to wait.
Thanks for reporting! I’ll look into it.
Yes I have the same issue with downloading - but also with processes like building the docstore.