Completely turn off new clipping behaviour
Hi @maickrau
Since the new release and as you mentioned in https://github.com/maickrau/GraphAligner/issues/28, per default --precise-clipping is turned on.
For my evaluation of reconstruction accuracy, I would like to turn this completely off. However, the lower limit is 0.501:
precise clipping identity cutoff must be between 0.501 and 0.999
Is there a specific reason? Thanks for any feedback!
The reason for the lower limit is because random alignments have an about 50% identity so a lower cutoff will treat random alignments as valid alignments. Can you say a bit more about your evaluation? Do you want to have the entire sequences aligned end-to-end?
Ah, that's where it's coming from. In https://github.com/pangenome/pgge I am measuring the reconstruction accuracy of a pangenome graph. I want to find out how well the sequences, we created the pangenome graph from, are preserved in the actual graph. Here I use a so called query sequence containment metric https://github.com/pangenome/rs-peanut#query-sequence-containment-qsc. Unfortunately, GitHub images are broken somehow at the moment, so here the idea: I just count the number of nucleotides matches across all queries and divide these by the number of all query lengths.
If the cutoff is at 0.501, I will miss some nucleotide matches of the query. So I won't get the complete picture.
I understand the need to prevent random alignments, but for us it would be helpful to take a look at everything.
Does this make sense to you?
To give a concrete example: One sequence in the graph is the full chm13 chr8 sequence. When aligning this sequence back to the created pangenome graph, we split the chm13 chr8 sequence into sizes of 100kb to not run out of memory and be more efficient in mapping. So clipping does not make sense here.
@maickrau I could do a PR to allow --precise-clipping to not be lower than 0.001 from https://github.com/subwaystation/GraphAligner/tree/precise_clipping_min_0.001 as it was the case in the old GraphAligner version. Not sure if 0.0 would slow down GraphAligner?
Anyhow, @ekg and me would be very happy, if we could have chat with you about sequence to graph alignment :) Hard to find your mail, so here is mine: [email protected]. I can arrange things or feel free to contact me. Cheers!