--frameshift parameter value
Hey,
I am currently investigating how to best use diamond with long, error prone reads. I am aware of the --long-reads option. I wonder why the recommended value of -F is 15?
I made a small analysis in which I took some RefSeq assemblies and cut them up into long reads. Then I copy the reads and introduce errors such that this second read set is similar to nanopore reads, in terms of rates of substitutions, insertions and deletions (11% overall error rate). Using different frameshift parameters to map the reads to a reference database using diamond blastx, I get the following results:

Could you explain why -F 15 is recommended instead of -F 1? It seems like I am losing most hits using -F 15.
Best, Stefan
I must admit that I never looked into this, the frameshift feature was modeled after the LAST aligner and this is what the author recommends as penalty. This is certainly an interesting observation and worth further study!
Ok, thank you for your reply! I will use -F 1 then until I read something opposing that. Thank you for developing and maintaining diamond, I use it a lot and it's incredibly helpful!
I have put a bit more research into this, as I thought my choice of --id, coupled with the high error rate might have affected the results. So I redid the analysis with different sequence percent identity cutoffs to the reference, and a set of fake reads with a lower error rate.
80% identity

70% identity

60% identity

So it seems like the percent identity in combination with the error rate influence the number of hits quite strongly, and leads to less hits the higher the frameshift penalty at high error rates. At higher identities >70%, I will opt for -F 1 for long reads with higher error rates.