compleasm
compleasm copied to clipboard
compleasm protein mode finding multiple BUSCOs in same mRNA
I'm using protein mode compleasm in the BRAKER 3.08 pipeline and have a script to annotate the BUSCOs in the braker.gff3 output. I've found the counts do not quite agree with the summary.txt numbers and looking into it it seems that for some transcripts more than one BUSCO is hit by the transcript.
Here's an example of the counts. If this is a better question for the BRAKER group please let me know.
$ awk 'NR>1 && NF>2{print $3}' bbc/better/full_table.tsv | sort -V | uniq -c | sort -k1,1nr | head
37 g19735.t1
25 g28340.t2
20 g15812.t1
20 g22138.t1
19 g16782.t2
19 g16782.t3
18 g10537.t3
18 g7891.t3
18 g7891.t4
18 g7891.t5
...
Most are single BUSCO hits but there are 199 transcripts that hit more than one. Here's an example with 5 different BUSCOs
$ grep g6310.t1 bbc/better/full_table.tsv
76735at8457 Duplicated g6310.t1 154.6 587
13359at8457 Duplicated g6310.t1 249.3 618
41835at8457 Duplicated g6310.t1 214.5 558
84588at8457 Duplicated g6310.t1 161.4 507
71648at8457 Duplicated g6310.t1 196.3 543
Thanks for any info and agains thanks for the tool and its many uses.
--jim henderson