XAM.jl icon indicating copy to clipboard operation
XAM.jl copied to clipboard

Support for Concurrent Access with XAM.jl: Reader/Writer Safety and Locking Considerations

Open abhinavsns opened this issue 1 year ago • 0 comments

Hello,

I'm working on a distributed Julia application where multiple workers (processes) need to access the same BAM file concurrently, each reading different intervals using an associated index file. Specifically, I create separate XAM.BAM.Reader instances on different workers and have them read from the same BAM file but from different genomic regions.

I have a few questions regarding this use case:

Thread/Process Safety: Is the XAM.BAM.Reader thread-safe or process-safe when multiple instances on different workers read from the same BAM file concurrently but access different intervals? Are there any specific considerations or potential issues I should be aware of when doing this?

Writing Concurrently: In the same setup, I am considering using XAM.BAM.Writer for writing outputs. How should writing be handled when multiple workers might write to the same BAM file? Would it be sufficient to use file-level locks, such as:

BAM.Writer(BGZFStream(open(path, "w", lock=true), "w"))

Are there additional considerations or recommendations for safely writing to BAM files in a distributed environment?

Locking Mechanisms: If locks are necessary, what should I be careful of when implementing them? Is the file-level lock shown above sufficient for ensuring data integrity, or would additional locking strategies be required for BGZFStream?

Your guidance on this would be greatly appreciated. Thank you for your amazing work on XAM.jl!

abhinavsns avatar Aug 13 '24 06:08 abhinavsns