PaddleHelix
PaddleHelix copied to clipboard
Error in HelixFold3: TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 3jxv_A. Residue range: 11-115
Hello, I have encountered this error while trying the HelixFold3 app. You can see my JSON file below:
{
"entities": [
{
"type": "protein",
"sequence": "GVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKFDSSRDRNKPFKFMLGKQEVIRGWEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE",
"count": 1
}
]
}
2024-09-26 13:13:30 INFO Found an exact template match 3jxv_A.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 798, in _process_single_hit
features, realign_warning = _extract_template_features(
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 629, in _extract_template_features
raise TemplateAtomMaskAllZerosError(
helixfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 3jxv_A. Residue range: 11-115
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 387, in process_chain_msa
raw_features = data_pipeline._process_single_chain(
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer_parallel.py", line 213, in _process_single_chain
chain_features = self._monomer_data_pipeline.process(
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/pipeline_parallel.py", line 271, in process
templates_result = self.template_featurizer.get_templates(
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 957, in get_templates
result = _process_single_hit(
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 820, in _process_single_hit
warning = ('%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: '
TypeError: must be real number, not NoneType
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 483, in process_input_json
_, raw_features, type_chain_id, seqs = future.result()
File "/root/miniconda3/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/root/miniconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
TypeError: must be real number, not NoneType
2024-09-26 13:13:30 ERROR Task generated an exception : must be real number, not NoneType
2024-09-26 13:23:57 INFO Finished Jackhmmer (uniprot.fasta) query in 634.199 seconds
[MSA/Template] protein_B; seq length: 104; use: 1252.6531014442444
2024-09-26 13:23:57 INFO [Multiprocess] All msa/template use: 1253.4603350162506
Traceback (most recent call last):
File "/app/PaddleHelix/apps/protein_folding/helixfold3/inference.py", line 637, in <module>
main(args)
File "/app/PaddleHelix/apps/protein_folding/helixfold3/inference.py", line 496, in main
feature_dict = feature_processing_aa.process_input_json(
File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 499, in process_input_json
all_feats = add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_templ_feats)
File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 303, in add_assembly_features
hf2_msa_feats = pipeline_multimer.process_with_all_chain_features(chain_group_feats)
File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer.py", line 121, in process_with_all_chain_features
input_seqs.add(str(chain_features["sequence"]))
KeyError: 'sequence'
Could someone help me resolve this issue?
Thanks in advance!
All the best, Gokhan
I have encountered the same error with another protein.
{
"entities": [
{
"type": "protein",
"sequence": "MKFQHTFIALLSLLTYANAYDYFTTTLANQNPVCASVDVIQNVCTEVCGRFVRYIPDATNTNQFTFAEYTTNQCTVQVTPAVTNTFTCADQTSSHALGSDWSGVCKITATPAPTVTPTVTPTVTPTVTPTPTNTPNPTPSQTSTTTGSASTVVASLSLIIFSMILSLC",
"count": 1
}
]
}
2024-10-01 16:59:22 DEBUG Reading PDB entry from /mnt/af2/pdb_mmcif/mmcif_files/4l3a.cif. Query: MKFQHTFIALLSLLTYANAYDYFTTTLANQNPVCASVDVIQNVCTEVCGRFVRYIPDATNTNQFTFAEYTTNQCTVQVTPAVTNTFTCADQTSSHALGSDWSGVCKITATPAPTVTPTVTPTVTPTVTPTPTNTPNPTPSQTSTTTGSASTVVASLSLIIFSMILSLC, template: DLSKPGKYVVTLNAENDLQKALPVQVMVIVEKETPIPDPTPTPTPDPTPTPDPSPTPNPVINPN
2024-10-01 16:59:22 INFO Found an exact template match 4l3a_A.
2024-10-01 16:59:22 WARNING Template structure not in release dates dict: 4l3a
2024-10-01 16:59:22 DEBUG Reading PDB entry from /mnt/af2/pdb_mmcif/mmcif_files/4l3a.cif. Query: MKFQHTFIALLSLLTYANAYDYFTTTLANQNPVCASVDVIQNVCTEVCGRFVRYIPDATNTNQFTFAEYTTNQCTVQVTPAVTNTFTCADQTSSHALGSDWSGVCKITATPAPTVTPTVTPTVTPTVTPTPTNTPNPTPSQTSTTTGSASTVVASLSLIIFSMILSLC, template: DLSKPGKYVVTLNAENDLQKALPVQVMVIVEKETPIPDPTPTPTPDPTPTPDPSPTPNPVINPN
2024-10-01 16:59:22 INFO Found an exact template match 4l3a_B.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/apptainers/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 798, in _process_single_hit
features, realign_warning = _extract_template_features(
File "/apptainers/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 629, in _extract_template_features
raise TemplateAtomMaskAllZerosError(
helixfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 4l3a_B. Residue range: 482-545
@ggokturkk @jscgh Hi, all, this errors caused by missing mmcif structure coordinates and syntax during template feature extraction. we have fixed it. Please check: https://github.com/PaddlePaddle/PaddleHelix/pull/357