ir_datasets icon indicating copy to clipboard operation
ir_datasets copied to clipboard

Lock for writing to the cache files

Open eugene-yang opened this issue 2 years ago • 2 comments

I sometimes have multiple processes or multiple machines accessing the same storage cluster that hosts the cache directory of ir_datasets. If multiple processes decide to download the same dataset at the same time, they start writing to the same file and eventually crash. It would be nice if there is a locking mechanism that prevents more than one process from writing to the same file and asking other processes to wait.

eugene-yang avatar Mar 26 '23 19:03 eugene-yang

Thanks for reporting! I’ll look into it.

seanmacavaney avatar Mar 26 '23 20:03 seanmacavaney

Yes I have the same issue with downloading - but also with processes like building the docstore.

bpiwowar avatar Jul 07 '23 07:07 bpiwowar