Heuristics / Rule of Thumb for selecting --gap_cutoff (default 0.25)
Hi,
Can I ask how I should determine what the gap_cutoff parameter should be for different sequences?
For the target sequence I am looking at, the 25% cutoff removes the majority of sequences from the MSAs and only gives 2 clusters. None of these MSA clusters give good predictions using AF2, they are completely off with no structure.
Thanks! Attached the relevant log below
620 seqs removed for containing more than 25% gaps, 138 remaining eps n_clusters n_not_clustered 3.00 1 34 3.50 1 34 4.00 1 34 4.50 1 34 5.00 1 34 5.50 1 34 6.00 1 34 6.50 1 34 7.00 1 34 7.50 1 34 8.00 1 34 8.50 1 34 9.00 1 34 9.50 2 31 10.00 1 34 10.50 2 31 11.00 1 34 Selected eps=9.50 138 total seqs 2 clusters, 127 of 138 not clustered (0.92) avg identity to query of unclustered: 0.38 avg identity to query of clustered: 0.31