compleasm icon indicating copy to clipboard operation
compleasm copied to clipboard

compleasm protein mode finding multiple BUSCOs in same mRNA

Open jbh-cas opened this issue 10 months ago • 0 comments

I'm using protein mode compleasm in the BRAKER 3.08 pipeline and have a script to annotate the BUSCOs in the braker.gff3 output. I've found the counts do not quite agree with the summary.txt numbers and looking into it it seems that for some transcripts more than one BUSCO is hit by the transcript.

Here's an example of the counts. If this is a better question for the BRAKER group please let me know.

$ awk 'NR>1 && NF>2{print $3}' bbc/better/full_table.tsv | sort -V | uniq -c | sort -k1,1nr | head
     37 g19735.t1
     25 g28340.t2
     20 g15812.t1
     20 g22138.t1
     19 g16782.t2
     19 g16782.t3
     18 g10537.t3
     18 g7891.t3
     18 g7891.t4
     18 g7891.t5
      ...

Most are single BUSCO hits but there are 199 transcripts that hit more than one. Here's an example with 5 different BUSCOs

$ grep g6310.t1 bbc/better/full_table.tsv 
76735at8457	Duplicated	g6310.t1	154.6	587
13359at8457	Duplicated	g6310.t1	249.3	618
41835at8457	Duplicated	g6310.t1	214.5	558
84588at8457	Duplicated	g6310.t1	161.4	507
71648at8457	Duplicated	g6310.t1	196.3	543

Thanks for any info and agains thanks for the tool and its many uses.

--jim henderson

jbh-cas avatar Apr 15 '24 21:04 jbh-cas