openfold
openfold copied to clipboard
train error about chain_data_cache
hi , I generated chain_data_cache.json,but when I run train_openfold.py, it will error in row 301 in openfold/data/data_module.py,because train_dataset is much larger than chain_data_cache.json,chain_data_cache.json cannot fully contain train_dataset
The chain_data_cache.json needs to be generated for the training set. Could you elaborate on chain_data_cache being too small?
I generated chain_data_cache.json based the pdb_mmcifs,which contains 18w+ cif files. but the chain_data_cache.json only include files with a suffix of numbers, not files with a suffix of letters. the training set gengrated by precompute_alignments.py contains many files suffixed with letters, such as 1aya_A,1aya_B, None of these are in chain_data_cache.json
Could you share a) the contents of the mmcif directory and b) the keys of the chain data cache? The chain names for the alignments should be the same as the ones in the chain data cache.
a) 1ay0.cif 1ay2.cif 1ay4.cif 1ay6.cif 1ay8.cif 1aya.cif 1ayc.cif 1aye.cif 1ayg.cif 1ayj.cif 1ayl.cif 1ayn.cif 1ayp.cif 1ayu.cif 1ayw.cif 1ayy.cif 1ay1.cif 1ay3.cif 1ay5.cif 1ay7.cif 1ay9.cif 1ayb.cif 1ayd.cif 1ayf.cif 1ayi.cif 1ayk.cif 1aym.cif 1ayo.cif 1ayr.cif 1ayv.cif 1ayx.cif 1ayz.cif
b) 1aym_1 laym_2 1aym_3 1aym_4 1ayn_1 1ayn_2 1ayn_3 1ayn_4
Could you re-run the chain data cache script on the newest commit? I pushed some changes today.
yeah, it works, you make the cluster_size value of the chain without in cluster_size_dict equal to -1, I wanna know why these chains does not belong to any cluster
Probably because you're not providing a cluster file. If you are providing one, then those chains do not appear in it.
thanks