thread safety in faidx_fetch_seq
I am getting unexpected behavior in faidx_fetch_seq when I run with multiple threads, using the pthread library for multithreading. I am linking libhts.a along with C++ code, using the -pthread flag. I have a single faidx_t that is loaded once and then passed to each call to faidx_fetch_seq running on different threads. Since faidx_fetch_seq doesn't modify the index, I was expecting this to be thread-safe.
If I run with only a single thread or using a mutex lock, I have no issues. Otherwise, the output is unpredictable.
Am I missing something about how to use faidx_fetch_seq in a multi-threaded program, or have to compile/link htslib for this purpose?
char * f = faidx_fetch_seq(index, const_cast<char*>(chr_name.c_str()), p1, p2, &len);
std::string out(f); // occasionally outputs empty string when run multi-thread only
free(f);
return (out);
Calling fai_load() reads the contents of the .fai file into memory and holds a file pointer to the .fa file open (and this in-memory index and the file pointer constitute the faidx_t returned).
Calling fai_fetch() or faidx_fetch_seq() does I/O on that file pointer, seeking to the right place and reading the amount of sequence requested. Currently without any locking, so it's unsurprising that this is not thread-safe.
So we need to either document that these functions are not thread-safe and that user code must do its own locking, or add suitable locking to the faidx_t type.
Thank you for the explanation @jmarshall , that is very helpful. I was tricked by the const -ness of the faidx_t pointer, but if it is moving around a file pointer that would explain it. I have now taken to just locking my code before calling faidx_fetch_seq, and there are no problems.
@jmarshall I've been persistently frustrated by the thread-unsafety of the faidx_t API, so I implemented a header-only extension to htslib that provides fully reentrant and thread-safe access to a single .fai/BGZF index: https://github.com/waveygang/faigz