Bandage icon indicating copy to clipboard operation
Bandage copied to clipboard

Visualization of paths along nodes

Open rchikhi opened this issue 10 years ago • 8 comments

Hi Ryan,

Do you have plans to implement a graphical way to visualize e.g. a contig as a line that goes along multiple nodes? e.g. using CSV information.

rchikhi avatar Dec 11 '15 20:12 rchikhi

Hi Ryan,

I would also be very interested in this feature. I'm the developer of Recycler (https://github.com/rozovr/Recycler), which searches for graph cycles as potential plasmids, and being able to do this would be great for visual validation using Bandage. Dr. Holt suggested reaching out via Twitter. Currently we're outputting a txt file that includes the list of nodes from the original assembly graph so that users can highlight them in Bandage, but it would be great if you could define a format for loading a set of candidates in a single file, e.g. one per line.

Many thanks, Roye

rozovr avatar Dec 15 '15 14:12 rozovr

Rayan and Roye,

There is currently a somewhat crude method for highlighting paths in Bandage: go to the 'Output' menu and select 'Specify exact path for copy/save'. A window then pops up where you can enter a path which will be shaded on the visualisation. The format is a comma-delimited list of node names, but it is necessary to specify which strand for each node (i.e. it must end in a + or -).

However, this functionality is really only appropriate for manually examining one path at a time. I realise that a more robust means for visualising paths in Bandage would be good - something where a user could work with many paths.

SPAdes 3.6.2 recently came out and its assemblies now include a '.paths' file. This shows the graph paths used to create contigs (very much like Rayan mentioned). So I'm tempted to use that format as the standard. It looks like this:

NODE_1_length_237403_cov_243.219_ID_533
22-,13-,11+,13-,12+;
29+,21-,20+,21-,6-
NODE_1_length_237403_cov_243.219_ID_533'
6+,21+,20-,21+,29-;
12-,13+,11-,13+,22+
NODE_2_length_74848_cov_232.675_ID_535
14+,26+,27-,26+,27-,26+,19-,9-,7+,9-,8+
NODE_2_length_74848_cov_232.675_ID_535'
8-,9+,7-,9+,19+,26-,27+,26-,27+,26-,14-
NODE_3_length_23937_cov_233.356_ID_537
5+,10+,23+,28+
NODE_3_length_23937_cov_233.356_ID_537'
28-,23-,10-,5-
NODE_4_length_20545_cov_359.166_ID_539
18+,1-,3+,23-,24+
NODE_4_length_20545_cov_359.166_ID_539'
24-,23+,3-,1+,18-

First a line gives the path name, then one or more lines give the paths. If the paths are made up of more than one part, then each part is on a separate line with a semicolon in between. This is useful for when the contig was made by scaffolding across gaps and a single graph path doesn't exist.

So here is my tentative plan: add a new node colour scheme in Bandage for highlighting graph paths. The user can load in a file (like the one above) where paths could be visualised one at a time (by selecting the path name in a drop-down box). Also, the GFA format allows for path lines which have paths in a similar comma-delimited string. So if you open a GFA graph in Bandage which contains paths, the paths would be ready to visualise.

Do you have any thoughts on the matter? If I implement things as planned, you could either save your paths to a separate file or include them with your graph in GFA format.

rrwick avatar Dec 18 '15 05:12 rrwick

Hi Ryan,

Thanks for your response and I'm glad you're on board with implementing it. I'll let Samarth (who is the developer of the Falcon2fastg projet I'm involved with -- the reason why I posted this issue) reply to your response.

The aesthetics of representing those paths can be tricky. I think that being able to visualize multiple paths as lines that are drawn alongside the nodes, as opposed to highlighting nodes with multiple colors, could be less confusing. e.g. https://networkx.lanl.gov/trac/raw-attachment/ticket/199/PathDrawerSimple.png from networkX (code here: https://networkx.lanl.gov/trac/ticket/199) But in the end it is your call, as this suggestion is maybe complex to implement.

rchikhi avatar Dec 18 '15 16:12 rchikhi

Hi Ryan,

I'd like to mention a couple of thoughts from a Falcon2Fastg point of view :

  1. Between the two flavors (GFA and .paths) that you proposed, we would prefer the .paths implementation. This way we could just add a function to Falcon2Fastg to write the .paths file, instead of having to convert everything to GFA.
  2. Along with a drop down box for visualizing paths one-at-a-time, is it possible to load all the paths at the same time, and distinguish each path by coloring it with a different color? This will be useful to see the relationship between all overlapping reads and all contigs in a long read assembly, just by loading the reads.fastg and the associated ctgs.paths file.

Thanks! -Samarth

md5sam avatar Dec 18 '15 16:12 md5sam

Ryan,

Thanks for the quick reply. I just wanted to second the vote in favor of .paths over .GFA. Also, I think Rayan's idea of having colored lines along the paths is a great one, as it will be much easier to interpret in the case of overlapping paths, which are often the challenging (and thus interesting) parts of the assembly.

Cheers, R

rozovr avatar Dec 19 '15 09:12 rozovr

Okay, I'll go ahead with adding support for a '.paths' file. I'll probably include GFA paths support as well, as I'd like Bandage to support the GFA format as much as possible. Regarding how to visualise it, I'm convinced that coloured lines would be useful. A drop-down box would let the user view all the paths at once or one at at time.

I'll make this feature a higher priority, so stay tuned!

rrwick avatar Dec 21 '15 00:12 rrwick

Great! Thanks Ryan.

md5sam avatar Dec 21 '15 00:12 md5sam

GFA paths are implemented in #64

asl avatar May 02 '18 23:05 asl