companion
companion copied to clipboard
sub-optimal performance on fungal genome
Hello,
Not sure who is monitoring these issues now @satta has left Sanger, but I wasn't sure where else to send it.
Sascha showed me how to run companion on the command line using a fungidb reference. I have attached the config I used. crypto.config.txt It took me some time to come around to quality check the annotation and there seems to be a bit of a problem. I ran a couple of my own samples through and there were a large number of pseudo-genes (3372 of total 5237 without exonerate, 2727 of 5514 with exonerate). These were not only just slightly miscalled as pseudo-genes, when I took the protein output of companion and blasted it vs the reference proteome, only 3515 of 5514 proteins had 60% reciprocal coverage (i.e. 60% of the query protein was covered by a hit which covered 60% of the reference protein).
As another quality check, I ran the reference genome fasta through companion, which is identical to the sequence in teh fungidb and should have a very very similar annotation. there were 2693 pseudo-genes and only 3811 of 5681 proteins had 60% reciprocal coverage vs the reference proteome.
I was just wondering if I could get some pointers as to where to start de-bugging. Perhaps in the RATT parameters, as the fact that given an identical reference genome, there are still lots of pseudo-genes called indicates the transfer is not working well?
Best,
Phil