AF_Cluster
AF_Cluster copied to clipboard
Issues regarding the DBSCAN hyperparameters finding.
Hi, I am using AF_Cluster for predicting alternative conformations for some proteins. However, the scan for epsilon fails for cases where the epsilon scan is stopped by the condition of esp>10 and n_clust==1.
May I ask why this condition is mandatory during the DBSCANs? What is the best way to work around the situation where for all eps <= 10 and n_clust==1 there only exists one cluster?
Referring to file AF_Cluster/scripts/ClusterMSA.py
Line 115-116
if eps>10 and n_clust==1:
break
thank you very much! Tengyu
Hi Tengyu, thanks for your interest in the code! I coded it as such because I was finding in my screening that eps was typically smaller than 10 and if by 10 the number of clusters detected was only 1, then it was a waste of time to continue scanning larger values.
Can I ask roughly your MSA sizes and protein lengths are to be encountering this? I am happy to move it to be an optional flag, or if you’d like to submit a PR, happy to go that route too.
Best,
Hannah
On Nov 4, 2022, at 6:26 AM, Tengyu @.***> wrote:
Hi, I am using AF_Cluster for predicting alternative conformations for some proteins. However, the scan for epsilon fails for cases where the epsilon scan is stopped by the condition of esp>10 and n_clust==1.
May I ask why this condition is mandatory during the DBSCANs? What is the best way to work around the situation where for all eps <= 10 and n_clust==1 there only exists one cluster?
Referring to file AF_Cluster https://github.com/HWaymentSteele/AF_Cluster/scripts https://github.com/HWaymentSteele/AF_Cluster/tree/main/scripts/ClusterMSA.py
Line 115-116
if eps>10 and n_clust==1: break
thank you very much! Tengyu
— Reply to this email directly, view it on GitHub https://github.com/HWaymentSteele/AF_Cluster/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMNCUZ6BNYGDNTDWNX4LA3WGTQFDANCNFSM6AAAAAARXBAI3Q. You are receiving this because you are subscribed to this thread.
Thanks for your reply.
I got your meaning. But the condition is a little tough for some MSAs. Note I generated MSAs following the protocol of AF2. Moving the condition to an optional flag is a good way to make the program more flexible.
One case is 3QF4_A, whose MSA size is 8359 and sequence length is 572.
Regards Tengyu
Hello, I support to add codes to make the system more generally applicable. In my case, epsilon ~ 22. I got this after I set "--max_eps=30" to relax the maximum epsilon and commented out the two break lines mentioned by the original poster.