ray icon indicating copy to clipboard operation
ray copied to clipboard

Overlapping contigs - ratio < 10%

Open zorino opened this issue 12 years ago • 1 comments

When 2 contigs overlap and the ratio of that matching region is < 10% Ray won't merge those 2 contigs.

Exemple :

                                      Overlap=9142/9142 (100%)

-----------------------------------------------------> contig-28 length= 197810 <----------------------------------- contig-45 length= 94175

zorino avatar Jun 20 '12 14:06 zorino

Hello Zorino,

The file ParallelPaths.txt contains all the paths that were computed in parallel.

Ray removes the redundancy by eliminating paths included in other longer paths.

But sometimes, paths overlap because path traversability in a non-bidirectional de Bruijn subgraph is a non symmetric property for the algorithms used by Ray. This means that sometimes you can not cross region B starting from region A, but you can cross region B starting from region C (see drawing below)


region A region B region C

In those cases, there will overlapping paths (or contigs). In numerous cases, the overlap is rather long so the 10% rule is not a limiting factor.

But it seems that in your case it is.

If you are skilled in C++, the concerned code is inside:

plugin: JoinerTaskCreator class: JoinerWorker file: code/plugin_JoinerTaskCreator/JoinerWorker.cpp line: 411

Just lowering the threshold will solve the problem, but will not be probably be safe.

In my opinion there should be additional testing to check if the overlap is a repeat.

Some documentation should you want to work on this:

Documentation/CodingStyle.txt Documentation/Submit-a-patch.txt

sebhtml avatar Jun 20 '12 15:06 sebhtml