edlib icon indicating copy to clipboard operation
edlib copied to clipboard

Get suboptimal scores

Open iprada opened this issue 7 years ago • 2 comments

Hi,

I am working on probabilistic alignment, were I compute a probability based on the score of the best alignment by looking at scores of the suboptimal n alignments. I guess that this is not implemented on edlib for speed issues. Would there be a chance that this is implemented , so that an user can retrieve the best n hits, or at least the second best hit (which would be great for handling the probability of the best hit based on the second best score)?

Thanks a lot!

iprada avatar Dec 21 '17 16:12 iprada

Hi @iprada, could you explain in more detail what exactly would you like Edlib to be able to do?

You said you would like it to return best n hits -> what does that mean? Let's say best score is 20. You would like to know n scores that are second best to that? I am afraid that is something that Edlib can't return as it is always calculating the optimal score. But I am also pretty sure that suboptimal scores would always be 19, 18, 17, ... . How would that help you?

Martinsos avatar Dec 24 '17 00:12 Martinsos

Hi, I am really sorry for answering so late, I missed the email notification

Sorry for not explaining my self well before. I have go trough the problem by masking the alignment returned by edlib and them looking for the 2sd hit an so on until I get the number of hits I wanted.

What I was trying to mean is the following:

When aligning short reads to a genome, the aligners compute a mapping score of the alignment by looking at the alignment score of the best hit and the score of the 2sd hit. Brieftly, if there is a big difference between the score of the best alignment when compared to the second best alignment, that alignment gets a high mapping score. Even though this is not explained in the programs, I assume that the aligners look at the second best hit by masking the position of the 1st hit ( because as you have said, the suboptimal scores would be 19,18,17...). So what I was trying to say was that it could be great, at least when using edlib to align short sequences to long sequences, to keep track of some suboptimal hits to asses the score of the best hit. But anyway, I think that this is not so important as one can go through it by masking the positions of the best alignment

Thanks for reading!

iprada avatar Jan 19 '18 10:01 iprada