ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

colabfold_batch misuses templates from pdb-hit-file when run locally

Open bbardiaux opened this issue 2 years ago • 1 comments

Expected Behavior

When running colabfold_batch, all templates found by colabfold_search in batch mode (multiple query sequence in single multi-fasta file) should be used appropriately (templates found for a given query sequence used when predicting structure for the corresponding a3m MSA)

Current Behavior

colabfold_batch takes only the first 20 unique pdb id from the pdb-hit-file and puts them in the templates/ folder. Yet the pdb-hit-file generated by colabfold_search contains hits for all input query sequences. (See related issue https://github.com/sokrypton/ColabFold/issues/523 about a bug in colabfold_search with templates. I could go further the colabfold_search step with little trick)

Steps to Reproduce (for bugs)

colabfold_search --threads 8--db-load-mode 2 --use-templates 1 --db2 pdb100_230517 all.fasta ${COLABFOLD_DB} msas

colabfold_batch --num-recycle 12 --templates --pdb-hit-file msas/pdb100_230517.m8 --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ msas/ results/

Context

all.fasta

$ cat all.fasta
>DnaN_AE003852.1_100634_99867
MKFTIERSHLIKPLQQVSGTLGGRASLPILGNLLLKVEENQLSMTATDLEVELISRVTLEGEFEAGSITVPARKFLDICRGLPDSAVITVLLEGDRIQVRSGRSRFSLATLPASDFPNIEDWQSEVQVSLTQAELRGLIEKTQFSMANQDVRYYLNGMLFEIDGTTLRSVATDGHRMAVAQAQLGADFAQKQIIVPRKGVLELVKLLDAPEQPVVLQIGHSNLRAEVNHFVFTSKLVDGRFPDYRRVLPQHTSKTLQTGCEELRQAFSRAAILSNEKFRGVRVNLADNGMRITANNPEQEEAEELLDVSFEGEPIEIGFNVSYILDVLNTLRCDNVRVSMSDANASALVENVDDDSAMYVVMPIRL:MIDTHAHVYASEFDHDRDEVIARARQVGIEKILMPNIDLNSIAPMLATEKAYPDLCHSMMGLHPCYVDANVKQTLATIYEWFSRHTFIAVGEIGIDLYWDKTFKAEQEMAFLTQLNWAKELDLPVVIHTRDSLNETLALLKQAQDGRLRGVFHCFGGSVDEAKAINDLGFHLGIGGVSTFKNSGMDQVIPQLDLNYVILETDCPYLAPVPHRGKRNEPMLTHLISEKVAQLRSLPLGEVIKITNNNSKALFGLDK
>DnaN_AE003852.1_1006539_1005943
MKFTIERSHLIKPLQQVSGTLGGRASLPILGNLLLKVEENQLSMTATDLEVELISRVTLEGEFEAGSITVPARKFLDICRGLPDSAVITVLLEGDRIQVRSGRSRFSLATLPASDFPNIEDWQSEVQVSLTQAELRGLIEKTQFSMANQDVRYYLNGMLFEIDGTTLRSVATDGHRMAVAQAQLGADFAQKQIIVPRKGVLELVKLLDAPEQPVVLQIGHSNLRAEVNHFVFTSKLVDGRFPDYRRVLPQHTSKTLQTGCEELRQAFSRAAILSNEKFRGVRVNLADNGMRITANNPEQEEAEELLDVSFEGEPIEIGFNVSYILDVLNTLRCDNVRVSMSDANASALVENVDDDSAMYVVMPIRL:MEKHSHKEDWIAILTGTFLVAQGVYFLQAGQLLTGGTTGLALLMTQFLPLTFGVLYFLSNCPFYLLAWKRFGARFAFNSAISGALVSIFADHLAMLITLEKVNVVYCAVAGGLLMGLGMLILFRHRSSLGGFNVLCLFIQDRFGISVGKSQMAIDGLILLASFFFVSPLTIGLSILGAFLLNIVLAMNHKPSRYRVIY
>DnaN_AE003852.1_1021527_1021997
MKFTIERSHLIKPLQQVSGTLGGRASLPILGNLLLKVEENQLSMTATDLEVELISRVTLEGEFEAGSITVPARKFLDICRGLPDSAVITVLLEGDRIQVRSGRSRFSLATLPASDFPNIEDWQSEVQVSLTQAELRGLIEKTQFSMANQDVRYYLNGMLFEIDGTTLRSVATDGHRMAVAQAQLGADFAQKQIIVPRKGVLELVKLLDAPEQPVVLQIGHSNLRAEVNHFVFTSKLVDGRFPDYRRVLPQHTSKTLQTGCEELRQAFSRAAILSNEKFRGVRVNLADNGMRITANNPEQEEAEELLDVSFEGEPIEIGFNVSYILDVLNTLRCDNVRVSMSDANASALVENVDDDSAMYVVMPIRL:MPKQKASYEALLEEVVETLKHSPDGVNEIVESSAKYVDAANDLTKDELALISAYVKADLKEFSQSFEQSKSSPFYLMITNSIWQGLLDITDRTKVEWVELFADLEHQGLYQAGDMIGLGVLICDQCGHKTEFNHPTEIEPCSQCGGKAFSRQPLKP
....
$ cat msas/pdb100_230517.m8 
102	4uhd_A	0.196	254	191	6	22	269	23	269	3.809E-45	167	47M5I21M1D42M2D30M2D31M2I32M1D38M
102	6i8w_B	0.187	266	205	4	7	267	45	304	1.259E-41	157	61M4I65M4D64M2I29M1D36M
102	5egn_E	0.211	265	199	4	5	266	1	258	2.832E-40	153	15M2I46M5I98M2D52M1D44M
102	6eb3_A	0.193	264	203	5	10	270	7	263	1.831E-39	151	9M2I23M1D27M5I128M1D28M1D39M
...

Your Environment

colabfold release 1.5.3

bbardiaux avatar Nov 19 '23 15:11 bbardiaux

+1 Has this issue been addressed?

beazerj avatar Mar 25 '24 19:03 beazerj