openfold icon indicating copy to clipboard operation
openfold copied to clipboard

UnboundLocalError in generate_chain_data_cache.py

Open calmasri opened this issue 2 years ago • 2 comments

Hi, I sucessfully generated the mmcif cache, but ran into an issue while trying to generate the chain data cache:

python3 scripts/generate_chain_data_cache.py      /data/openfold/data/pdb_mmcif/mmcif_files_new/     chain_data_cache.json     --cluster_file clusters-by-entity-40.txt     --no_workers 16
10%|███████▋                                                                     | 18931/188155 [12:41<1:45:11, 26.81it/s]^[[A^[ 10%|███████▊                                                                     | 18941/188155 [12:42<1:56:22, 24.23it/s]^[[A^[ 10%|███████▊                                                                     | 18961/188155 [12:43<2:04:01, 22.74it/s]^[[B^[ 21%|████████████████▎                                                            | 39711/188155 [30:50<4:28:44,  9.21it/s]^[[B^[ 21%|████████████████▎                                                            | 39731/188155 [30:50<2:54:58, 14.14it/s]^[[B^[ 21%|████████████████▎                                                            | 39741/188155 [30:50<2:23:13, 17.27it/s]^[[B^[ 37%|████████████████████████████▌                                                | 69940/188155 [55:33<1:33:54, 20.98it/s]^[[B
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
 File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 121, in worker
   result = (True, func(*args, **kwds))
 File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
   return list(map(*args))
 File "scripts/generate_chain_data_cache.py", line 57, in parse_file
   local_data["resolution"] = 0.
UnboundLocalError: local variable 'local_data' referenced before assignment
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "scripts/generate_chain_data_cache.py", line 132, in <module>
   main(args)
 File "scripts/generate_chain_data_cache.py", line 96, in main
   for d in p.imap_unordered(fn, files, chunksize=args.chunksize):
 File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 354, in <genexpr>
   return (item for chunk in result for item in chunk)
 File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/multiprocessing/pool.py", line 748, in next
   raise value
UnboundLocalError: local variable 'local_data' referenced before assignment

Any help would be highly appreciated

calmasri avatar Aug 26 '22 18:08 calmasri

The issue turned out to be starting line 47 in generate_chain_data_cache.py :

  elif(ext == ".pdb"):
       with open(os.path.join(args.data_dir, f), "r") as fp:
           pdb_string = fp.read()
         
       protein_object = protein.from_pdb_string(pdb_string, None)

       chain_dict = {} 
       chain_dict["seq"] = residue_constants.aatype_to_str_sequence(
           protein_object.aatype,
       )
       local_data["resolution"] = 0.

       cluster_size = chain_cluster_size_dict.get(file_id.upper(), -1)
       if(chain_cluster_size_dict is not None):
           cluster_size = chain_cluster_size_dict.get(
               full_name.upper(), -1
           )
           chain_dict["cluster_size"] = cluster_size

       out = {file_id: chain_dict}

The variables local_data and full_name are not assigned for pdbs

calmasri avatar Sep 09 '22 16:09 calmasri

Hello, the updated version of this file still does not correct this issue (I get the same error for the "full_name" variable). Anything new on this?

VasiPitsilou avatar Apr 21 '23 14:04 VasiPitsilou