megahit icon indicating copy to clipboard operation
megahit copied to clipboard

Fragmented contigs

Open Sebastien-Raguideau opened this issue 6 years ago • 4 comments

Hello, I am looking at the contig graph produced using contig2fastg. I saw sequential contigs with no branching. Maybe I am missing something but if there is no branching why are those contigs not merged into a single one? Do I need to process the assembly and look for those? Here is a bandade close up. Also, seems that the coverage for both is 1 is it relevant? example Best, Seb

Sebastien-Raguideau avatar Sep 20 '18 15:09 Sebastien-Raguideau

Interesting. Could you show me the two sequences and the k used?

voutcn avatar Nov 01 '18 22:11 voutcn

Hello, So I did a grep on the .gfa to extract sequence and links, that's this file. If you are interested I have other examples. Also, what I said previously about coverage being 1 was a mistake on my part, it's ~5. k=141 Fragmented_contigs.txt Best, Seb

Sebastien-Raguideau avatar Nov 02 '18 09:11 Sebastien-Raguideau

The two sequences do not share any common k-mers, and I doubt why they are connected. Did you use megahit_toolkit's fasta2fastg to generate the graph?

voutcn avatar Nov 11 '18 04:11 voutcn

Hi, They do share a 141 long kmer but one of them need to be reverse complemented. I didn't realise it when first submitted. It's because I'm using the .gfa format which only store sequence in one orientation so the files are less heavy. So, yes I am usually using megahit_toolkit contig2fastg to generate the graph. I then use one of Bandage function to translate the fastg file in .gfa . I need then script for renaming purpose, as translation as megahit_toolkit contig2fastg change the name of the contigs. Though it keeps things in the same order, so it is fine. Best, Seb

Sebastien-Raguideau avatar Nov 11 '18 11:11 Sebastien-Raguideau