pyScaf
pyScaf copied to clipboard
Less BUSCO genes after scaffolding.
Hi,
I would just like to make a return on the scaffolding of my assembly (Sanger technology) with PacBio reads (30x coverage), by using pyScaf.
pyScaf is fast and generates interesting results in the first place. I went from 2,059 scaffolds to 1,344 scaffolds, which was encouraging. Then I launched BUSCO on both assemblies and got the following results :
95.6% of complete BUSCO genes for my assembly (before pyScaf) and 78.7% of complete BUSCO genes after pyScaf. Before scaffolding, I have 37 missing genes, after pyScaf I have 284 missing genes.
I launched pyScaf with these parameters :
pyScaf.py -f Scaffolds.fasta --identity 0.80 -o Scaffolds.pyScaf.fasta -t 10 --log pyScaf_run.log --longreads all_raw_reads.Pacbio.fasta
Maybe I have to change them ? Do you have any advice to me?
Hi, This is probably the same problem as the one mentioned in issue #3 :
Additionly, there might be some over-scaffolding that many contigs seemed with large overlap were linked directly (without any check such as whether the contigs overlapped actually).
In this example (.tsv output of a long read scaffolding run), a 2.4 Mb scaffold and a 3.3 Mb scaffold are merged into a 3.3 Mb scaffold. 2.4 Mb of non-redundant sequence is lost in the process.
scaffold00018 3324699 2 scaffold31_size2472606 scaffold20_size3324684 1 0 -3065490 0
Hi !
Yes I found the problem ! I used OPERA to perform scaffolding of my Sanger assembly with PacBio reads and I saw that OPERA merged some contigs, generating this problem with BUSCO. As OPERA generates a file giving scaffolding information, I wrote a script to perform "manual" scaffolding without merging my contigs and it's perfect ! BUSCO is very good after that. If someone encounters such problems with OPERA, contact me and I will provide my script.
Thank you, Amandine
Hi Amandine @a-velt
I face the same question now. Could you share your script with me?
Thanks in advances Guangshuo