NOVOPlasty
NOVOPlasty copied to clipboard
Extending Assembly from Seed
Hi, I am trying to extend my assembly from my contig file, but the assembly still failed to produce a complete genome. Do you have any suggestions on any parameters or methods that I could change to improve my assembly? Attached is my log file. Thanks. log_Cryptocoryne_nurii1.txt
Did it extend the seed a bit or nothing? And it's best to run again with extended log to 1 and send me that file
Thanks for the reply. I used different contig file as seed and it managed to produce a circularized assembly. However, I suspect the genome is still not complete. Even for my genus, I have not found any data published yet, for published related taxa chloroplast genome showed around 150k to 170k bp. But mine it is just 128938bp. How can I check the data, orientation of inverted repeat and improve my assembly? Is it possible for me to get complete plastome? log_extended_Cryptocoryne_nurii4.txt log_Cryptocoryne_nurii4.txt
Maybe the inverted repeat is not inverted, is possible in some species and then it circularizes early. Why don't you try a higher min genome range, like 150000, to see wat you get...
Thank you for your suggestion. The output produced 4 smaller contigs after I changed the genome range. What do you think the reason? log_Cryptocoryne_nurii6.txt log_extended_Cryptocoryne_nurii6.txt
There is a bug that doesn't always output all the contigs when the option "extend seed directly" is used, could you try this version with the same config file (I forgot if I fixed the problem, if not I will do it today) NOVOPlasty3.8.2.zip
So there is a big contig of 160000 assembled, but it just didn't output
And where do you get that large contig from? Is it from a different assembly software?
Yes. Actually I use the largest contig from Fast-Plast assembly. If I did not invoke the extend seed function, the output will only be uncircularized genome. I will try the version you provide ASAP.
I just tried the version you gave me and its still produce the same result.
Ok still have to fix, will have a look. Have you tried to run with the previous assembly, but just with a short seed (like the RUBP seed on this github)?
Yes but it will only produce small contig. log_Cryptocoryne_nurii15.txt log_extended_Cryptocoryne_nurii15.txt
But then you need to switch off extend seed directly, this option should only be used to extend an existing assembly... So could you try again without that option
I think this version should output the larger contig (but I am not sure) NOVOPlasty3.8.2.zip
But then you need to switch off extend seed directly, this option should only be used to extend an existing assembly... So could you try again without that option
I did.. the output produce one contig with 12564 bp length
I tried your latest version of NOVOPlasty and it managed to output ~160kbp length size contig. How can I merge those contigs or see if I can get them circularized? Merged_contigs_Cryptocoryne_nurii17.txt log_Cryptocoryne_nurii17.txt log_extended_Cryptocoryne_nurii17.txt
It couldn't circularize automatically and seems you have a complex genome so I am not sure that will be possible.. Could you send the extended log of the 12564 bp run? I am just curious why it is that short. What kind of data are you using, is it WGS, capture or RNA seq?
My data is WGS. Attached is my extended log log_extended_Cryptocoryne_nurii16.txt log_extended_Cryptocoryne_nurii6.txt
Sorry didn't had the time to look at it earlier, but it seems that 12kb sequence doesn't occur in the 120 kb assembly. The 12 kb region can't get extended further because it is flanked by AT rich regions where the coverage drops to 0, so completely circularising your cp genome seems not possible. But I would keep that 12 kb sequence, because it is part of the chloroplast genome that was missing from the assembly. Could you run it with this seed and send me the extended log:
AAGTATCGTGAATTTCTTCATGCTCGTTCCAAGTTCGAAGTACCATTTGTACAAATAAGAATCCCTTTCCTTACATGATTTCTTCTTCATATAGATAGATATAGGATCTATGGGGCAATTACTTAGAAGTACATTTTGTGCAACAGCCCTTCCTATCTGATAGAAAAGGATCCCATGATCCTGAACCGATCTGACCCGGGATC
Thank you for your time. I tried run with the seed you provided. Attached is my extended log: log_extended_Cryptocoryne_nurii25.txt
Hi, sorry could you run it again without giving a reference. Reference can help but it reverses the read to assemble it the same direction of the reference, so just would like to see the assembly in the other direction. And is the reference closely related? But it seems this assembly had the same problem of low coverage in another AT rich region.
log_extended_Cryptocoryne_nurii18.txt Attached is the extended log. The reference is not really close but it is the closest that available currently; the same Family but different Order in taxa. I suspected the low coverage problem, thus I am trying to combine different methods or manually explore different parameters to get reliable results. I have tried assembled it with MITObim and it managed to get one single contig (167kb~bp) but the orientation seems wrong when I tried to annotate it so I need to correct the assembly first before annotating it.
If you use MITObim de novo it should be ok, but don't use the reference based assembly. I checked this mode a few times and it generally just copies the reference sequence in to the assembly, so it will give a false result. It's very misleading because it will give on first sight a good result. Does FAST-PLAST also uses reference genomes, because a part of the sequence could be inaccurate, although can't say for sure, NOVOPlasty could be wrong too