bagit-python icon indicating copy to clipboard operation
bagit-python copied to clipboard

make_bag is not thread safe

Open jcushman opened this issue 7 years ago • 1 comments

Creating multiple bags in threads doesn't work:

import bagit
from multiprocessing.pool import ThreadPool
ThreadPool().map(bagit.make_bag, ('dirA', 'dirB'))

This fails with a FileNotFoundError because make_bag uses os.chdir, which is not thread-safe, so the two threads change directories on each other while bagging.

I see there's already a note in the code to stop using chdir: # FIXME: if we calculate full paths we won't need to deal with changing directories. I just wanted to add in particular that the current code prevents multithreading.

(Using a process pool instead of a thread pool would work around this issue, but doesn't help in my particular case because my worker threads need to share memory.)

jcushman avatar Aug 23 '18 16:08 jcushman

Indeed – I was also looking at a more comprehensive fix so we could also start supporting non-POSIX interfaces such as S3 in https://github.com/acdha/bagit-python/tree/flexible-fileio but I haven't worked on that in awhile.

acdha avatar Aug 23 '18 16:08 acdha