BandageNG icon indicating copy to clipboard operation
BandageNG copied to clipboard

Minimap2 support

Open paoloczi opened this issue 2 years ago • 6 comments

It's really great that Bandage gets a new life - thank you for taking that up.

As a possible enhancement, I suggest minimap2 support. Currently you can map the assembly using blast and display those super useful rainbow plots, but blast is getting a bit tired these days.

If you add this support, it would be useful to allow both mapping on the fly (like currently for Blast), as well as the use of a preexisting paf file written by minimap2.

paoloczi avatar Jun 02 '22 16:06 paoloczi

Hello

Yes, we thought about this. Fortunately, there is a workaround already! Convert PAF into BED and load BED.

asl avatar Jun 02 '22 16:06 asl

Can you do rainbow plots from BED? if so, that is a reasonable temporary solution until you can implement native minimap2 support.

paoloczi avatar Jun 02 '22 16:06 paoloczi

Just adding my vote:

Considering that you need to do some significant logistics to make this work, I would also love to have a fully integrated mapper that works for long sequences.

When using Bandage as a browser, it's convenient to map contigs or "Regions of Interest" on the fly, and converting GFA to FASTA, switching to the command line, then doing PAF->BED for each query is a bit cumbersome.

rlorigro avatar Jun 02 '22 16:06 rlorigro

@paoloczi Ranbow scheme does not make much sense for BED as there is no "query" there, yes.

@rlorigro Is BLAST not sensitive enough, or what are the problems that could be better solved using minimap2 in your case?

asl avatar Jun 02 '22 16:06 asl

In my experience, Blast tends to be very slow, and it often breaks up mappings in undesirable ways. minimap2 offers much better control.

paoloczi avatar Jun 02 '22 16:06 paoloczi

Where it fails is particularly in longer mappings/alignments. Sometimes it will just never finish running. The case where long mappings is useful is if you want to map a nanopore read to an assembly, for example, or you want to find a repetitive region of interest using a reference excerpt, which needs to be long to map accurately, or span duplicates of a gene.

rlorigro avatar Jun 02 '22 17:06 rlorigro

Here is the teaser. While UI shows BLAST it's only the UI. The reality is minimap2 under the hood aligning C4 sequences to C4A/C4B graph from https://zenodo.org/record/6617246

Screenshot 2022-08-09 at 17 52 11

The second one is a bit more extreme as we're aligning the whole MHC. I would not try this with BLAST : Screenshot 2022-08-09 at 18 17 33

Still, I would suggest specialized graph aligning tools like GraphAligner, PathRacer or SPAligner for sequence / HMM-to graph aligning :)

asl avatar Aug 09 '22 16:08 asl

out of curiosity, did you use minimap as an API or by calling it as an executable and using IO?

rlorigro avatar Aug 09 '22 22:08 rlorigro

out of curiosity, did you use minimap as an API or by calling it as an executable and using IO?

Binary is executed and final PAF is parsed. Essentially, the existing BLAST support was heavily refactored and generalized to allow different ways to obtain "query hits". Regardless of the way how they are obtained :)

asl avatar Aug 09 '22 22:08 asl

One need to know though that hit-combining and path-building approaches in Bandage are essentially brute-force :(

asl avatar Aug 09 '22 22:08 asl

For minimap isnt that more trivial because you have a "primary" alignment and you can just follow all the supplementaries in order of their query coordinates?

rlorigro avatar Aug 09 '22 22:08 rlorigro

Well, not quite :) Overall the problem is similar to that we're having in graph aligning: we have a set of "seed alignments" to the nodes of the graph and we need to chain them properly. Note that some seeds could be missed, some could be misplaced (think about alignment through repetitive region of the graph where multiple hits of the same query region is possible, etc.).

So, should the proper path be required, other tools have to be used. The ones that do proper graph alignment. After all, we do support GAF loading these days as well as other formats

asl avatar Aug 09 '22 22:08 asl

Hmm ok, well I agree that chaining should not really be in the scope of bandage

rlorigro avatar Aug 09 '22 22:08 rlorigro

It won't matter for assembly graphs, but aligning to the paths will be much more sensitive in the case of variation graphs. The alignments can be made against the path sequences of the graph and then injected into the node space.

On Wed, Aug 10, 2022, 00:29 Ryan Lorig-Roach @.***> wrote:

Hmm ok, well I agree that chaining should not really be in the scope of bandage

— Reply to this email directly, view it on GitHub https://github.com/asl/BandageNG/issues/44#issuecomment-1209954989, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJ42BZZNIT55QZICADVYLLULANCNFSM5XVRW66Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ekg avatar Aug 10 '22 05:08 ekg

@ekg Actually it does matter if we need to span a complex repeat :) E.g. in PathRacer we have an option of seeding from paths not nodes and it makes huge difference in complex repetitive regions.

I also thought about adding alignment to the paths. But likely I'd rewrite chaining to something a bit more efficient (currently there is a limit of 6 nodes in the "query path" to limit the combinatorial explosion of the bruteforce approach :) )

asl avatar Aug 10 '22 08:08 asl

@ekg FWIW https://github.com/asl/BandageNG/pull/114 implements alignment to the paths, query paths are automatically built out of the,

asl avatar Aug 24 '22 17:08 asl