AF_Cluster icon indicating copy to clipboard operation
AF_Cluster copied to clipboard

Heuristics / Rule of Thumb for selecting --gap_cutoff (default 0.25)

Open ackbar03 opened this issue 3 years ago • 0 comments

Hi,

Can I ask how I should determine what the gap_cutoff parameter should be for different sequences?

For the target sequence I am looking at, the 25% cutoff removes the majority of sequences from the MSAs and only gives 2 clusters. None of these MSA clusters give good predictions using AF2, they are completely off with no structure.

Thanks! Attached the relevant log below

620 seqs removed for containing more than 25% gaps, 138 remaining eps n_clusters n_not_clustered 3.00 1 34 3.50 1 34 4.00 1 34 4.50 1 34 5.00 1 34 5.50 1 34 6.00 1 34 6.50 1 34 7.00 1 34 7.50 1 34 8.00 1 34 8.50 1 34 9.00 1 34 9.50 2 31 10.00 1 34 10.50 2 31 11.00 1 34 Selected eps=9.50 138 total seqs 2 clusters, 127 of 138 not clustered (0.92) avg identity to query of unclustered: 0.38 avg identity to query of clustered: 0.31

ackbar03 avatar Nov 25 '22 09:11 ackbar03