openfold icon indicating copy to clipboard operation
openfold copied to clipboard

train error about chain_data_cache

Open liuxm117 opened this issue 3 years ago • 8 comments

hi , I generated chain_data_cache.json,but when I run train_openfold.py, it will error in row 301 in openfold/data/data_module.py,because train_dataset is much larger than chain_data_cache.json,chain_data_cache.json cannot fully contain train_dataset

liuxm117 avatar Feb 18 '22 09:02 liuxm117

The chain_data_cache.json needs to be generated for the training set. Could you elaborate on chain_data_cache being too small?

gahdritz avatar Feb 18 '22 18:02 gahdritz

I generated chain_data_cache.json based the pdb_mmcifs,which contains 18w+ cif files. but the chain_data_cache.json only include files with a suffix of numbers, not files with a suffix of letters. the training set gengrated by precompute_alignments.py contains many files suffixed with letters, such as 1aya_A,1aya_B, None of these are in chain_data_cache.json

liuxm117 avatar Feb 22 '22 06:02 liuxm117

Could you share a) the contents of the mmcif directory and b) the keys of the chain data cache? The chain names for the alignments should be the same as the ones in the chain data cache.

gahdritz avatar Feb 22 '22 22:02 gahdritz

a) 1ay0.cif 1ay2.cif 1ay4.cif 1ay6.cif 1ay8.cif 1aya.cif 1ayc.cif 1aye.cif 1ayg.cif 1ayj.cif 1ayl.cif 1ayn.cif 1ayp.cif 1ayu.cif 1ayw.cif 1ayy.cif 1ay1.cif 1ay3.cif 1ay5.cif 1ay7.cif 1ay9.cif 1ayb.cif 1ayd.cif 1ayf.cif 1ayi.cif 1ayk.cif 1aym.cif 1ayo.cif 1ayr.cif 1ayv.cif 1ayx.cif 1ayz.cif

b) 1aym_1 laym_2 1aym_3 1aym_4 1ayn_1 1ayn_2 1ayn_3 1ayn_4

liuxm117 avatar Feb 23 '22 03:02 liuxm117

Could you re-run the chain data cache script on the newest commit? I pushed some changes today.

gahdritz avatar Feb 23 '22 04:02 gahdritz

yeah, it works, you make the cluster_size value of the chain without in cluster_size_dict equal to -1, I wanna know why these chains does not belong to any cluster

liuxm117 avatar Feb 23 '22 06:02 liuxm117

Probably because you're not providing a cluster file. If you are providing one, then those chains do not appear in it.

gahdritz avatar Feb 23 '22 17:02 gahdritz

thanks

liuxm117 avatar Mar 18 '22 02:03 liuxm117