companion icon indicating copy to clipboard operation
companion copied to clipboard

sub-optimal performance on fungal genome

Open flashton2003 opened this issue 7 years ago • 5 comments

Hello,

Not sure who is monitoring these issues now @satta has left Sanger, but I wasn't sure where else to send it.

Sascha showed me how to run companion on the command line using a fungidb reference. I have attached the config I used. crypto.config.txt It took me some time to come around to quality check the annotation and there seems to be a bit of a problem. I ran a couple of my own samples through and there were a large number of pseudo-genes (3372 of total 5237 without exonerate, 2727 of 5514 with exonerate). These were not only just slightly miscalled as pseudo-genes, when I took the protein output of companion and blasted it vs the reference proteome, only 3515 of 5514 proteins had 60% reciprocal coverage (i.e. 60% of the query protein was covered by a hit which covered 60% of the reference protein).

As another quality check, I ran the reference genome fasta through companion, which is identical to the sequence in teh fungidb and should have a very very similar annotation. there were 2693 pseudo-genes and only 3811 of 5681 proteins had 60% reciprocal coverage vs the reference proteome.

I was just wondering if I could get some pointers as to where to start de-bugging. Perhaps in the RATT parameters, as the fact that given an identical reference genome, there are still lots of pseudo-genes called indicates the transfer is not working well?

Best,

Phil

flashton2003 avatar Jan 25 '17 10:01 flashton2003