strobemers icon indicating copy to clipboard operation
strobemers copied to clipboard

OMP parallellism generates wrong strobe<--> read associations in the index

Open blinard-BIOINFO opened this issue 1 year ago • 1 comments

@ksahlin

Fix: we had to comment this line in main.cpp to cancel parallelism:

//#pragma omp parallel for num_threads(n_threads) shared(read_cnt, output_file, q_id) private(acc,seq_rc, query_mers,query_mers_rc)

Bug observed with parallelism:

  • For a set of 100 reads of 100bp, we expected around 100 strobemers associated to each query in the computed index.
  • Observation: One read was associated to 300 strobemers in the index (3 times more than expected), two reads were associated to only ~50 strobemers.

It seems that the omp pragma is incorrect and variables q_id or read_cnt have the wrong value or are read incorrectly by the threads (probably updated by another threads beforehand?).

As a consequence, the full set of strobemers that should be associated to a given query can be actually partially associated to a other queries in the final index. Because is it related to parallelism, it is hard to reproduce with an example.

Maybe variables used query identifiers should be made more atomic ?

blinard-BIOINFO avatar Apr 04 '24 14:04 blinard-BIOINFO

I see, thanks for the report!

In more optimized implementations of the multiple query mapping problem we have removed OpenMP and are instead relying on std::thread - which makes the parallelisation more efficient, especially as the number of cores increase, using a "single producer multiple consumers" solution, e.g., as in strobealign.

Nevertheless, it is good to know this for future projects using the implementation in this repository. The easy way out for me is to, like you, deactivate parallelisation in this repo. I may add a PC solution later if I have time or find someone to implement it with threads.

ksahlin avatar Apr 08 '24 07:04 ksahlin