openfold
openfold copied to clipboard
UnboundLocalError in generate_chain_data_cache.py
Hi, I sucessfully generated the mmcif cache, but ran into an issue while trying to generate the chain data cache:
python3 scripts/generate_chain_data_cache.py /data/openfold/data/pdb_mmcif/mmcif_files_new/ chain_data_cache.json --cluster_file clusters-by-entity-40.txt --no_workers 16
10%|███████▋ | 18931/188155 [12:41<1:45:11, 26.81it/s]^[[A^[ 10%|███████▊ | 18941/188155 [12:42<1:56:22, 24.23it/s]^[[A^[ 10%|███████▊ | 18961/188155 [12:43<2:04:01, 22.74it/s]^[[B^[ 21%|████████████████▎ | 39711/188155 [30:50<4:28:44, 9.21it/s]^[[B^[ 21%|████████████████▎ | 39731/188155 [30:50<2:54:58, 14.14it/s]^[[B^[ 21%|████████████████▎ | 39741/188155 [30:50<2:23:13, 17.27it/s]^[[B^[ 37%|████████████████████████████▌ | 69940/188155 [55:33<1:33:54, 20.98it/s]^[[B
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "scripts/generate_chain_data_cache.py", line 57, in parse_file
local_data["resolution"] = 0.
UnboundLocalError: local variable 'local_data' referenced before assignment
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scripts/generate_chain_data_cache.py", line 132, in <module>
main(args)
File "scripts/generate_chain_data_cache.py", line 96, in main
for d in p.imap_unordered(fn, files, chunksize=args.chunksize):
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 354, in <genexpr>
return (item for chunk in result for item in chunk)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
UnboundLocalError: local variable 'local_data' referenced before assignment
Any help would be highly appreciated
The issue turned out to be starting line 47 in generate_chain_data_cache.py
:
elif(ext == ".pdb"):
with open(os.path.join(args.data_dir, f), "r") as fp:
pdb_string = fp.read()
protein_object = protein.from_pdb_string(pdb_string, None)
chain_dict = {}
chain_dict["seq"] = residue_constants.aatype_to_str_sequence(
protein_object.aatype,
)
local_data["resolution"] = 0.
cluster_size = chain_cluster_size_dict.get(file_id.upper(), -1)
if(chain_cluster_size_dict is not None):
cluster_size = chain_cluster_size_dict.get(
full_name.upper(), -1
)
chain_dict["cluster_size"] = cluster_size
out = {file_id: chain_dict}
The variables local_data
and full_name
are not assigned for pdbs
Hello, the updated version of this file still does not correct this issue (I get the same error for the "full_name" variable). Anything new on this?