miniprot icon indicating copy to clipboard operation
miniprot copied to clipboard

Use information from conserved introns

Open mparker2 opened this issue 3 months ago • 2 comments

Dear @lh3 ,

Thanks for developing miniprot. I have been trying it out & it is extremely useful.

I had an idea for a possible enhancement... it would be very interesting to be able to provide known intron positions (within the query protein sequences), and have a bonus score for alignments that include these. Many introns are very well conserved across species in terms of position and phase.

I'm not sure how this information would be best provided to miniprot, perhaps as a bed file or gff with protein coordinates & phase info showing how each query protein sequence is subdivided into exons.

Best wishes Matt

mparker2 avatar Mar 18 '24 17:03 mparker2

Thanks. GeMoMa is doing something similar. However, it is difficult to use position-specific scoring along the protein sequence (easier along the genome sequence), and it is also difficult for users to extract the information.

lh3 avatar Mar 18 '24 23:03 lh3

OK, thanks - I was not aware of GeMoMa. Perhaps a script for postprocessing of miniprot alignments might achieve a similar result. I might try this.

mparker2 avatar Mar 19 '24 12:03 mparker2