foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

A observed discrepency between alignment-type 3Di+AA / 3Di

Open Wangchentong opened this issue 8 months ago • 2 comments

Expected Behavior

Thanks for your amazing tool! I am clustering a bunch of afdb subset which has high confidence with two alignment-type 3Di+AA / 3Di. In my intuition, 3Di should give more non-singleton cluster compared to 3Di+AA, because the very diverse sequence which hold same structure will be assined to same cluster in 3Di mode, and assigned to different clusters in 3Di+AA mode.

Current Behavior

I test the cluster command of two aliment types on the same database(a subset contains 4 million afdb structure), --alignment-type 0(3Di) gives me 470715 singleton --alignment-type 1(3Di) gives me 759500 singleton

this is the cluster command i use: foldseek cluster afdb50_new afdb50_new_clust_v2 tmp --remove-tmp-files --alignment-type 0/--alignment-type 2

Is this the epxpected result? It looks the cluster program based on solely 3Di token work worse than 3Di+AA, what;s your suggestion if i want to cluster on structure without AA token?

Any help will be gratitude!

Wangchentong avatar Jun 13 '24 13:06 Wangchentong