Bo Zhang

Results 12 comments of Bo Zhang

Hi, I dug a little deeper into this bug by add the exception handling: ``` try: datapoint_idx = next(samples) except StopIteration: print("samples.length = {}, idx = {}".format(sum(1 for _ in...

> Try increasing the length of the training epoch. So this means set `--train_epoch_len` to a larger value?

Yeah. I've done that. But I found some chains doesn't have all the 4 files(3 .a3m and 1 .hhr). By the way, I've set the `--train_epoch_len` to 80000 but still...

``` srun python3 train_openfold.py \ /pscratch/sd/b/bz186/openfold/data/pdb_mmcif/mmcif_files \ /pscratch/sd/b/bz186/openfold/data/alignment_openfold \ /pscratch/sd/b/bz186/openfold/data/pdb_mmcif/mmcif_files \ /pscratch/sd/b/bz186/openfold/data/train_full_output \ 2021-10-10 \ --template_release_dates_cache_path=/pscratch/sd/b/bz186/openfold/data/mmcif_cache.json \ --precision=32 \ --gpus=4 \ --replace_sampler_ddp=True \ --seed=42 \ --deepspeed_config_path=/global/homes/b/bz186/openfold/deepspeed_config.json \ --checkpoint_every_epoch \ --obsolete_pdbs_file_path=/pscratch/sd/b/bz186/openfold/data/pdb_mmcif/obsolete.dat...

A single node with 4 GPUs , I did it on interactive mode.

``` /pscratch/sd/b/bz186/openfold/data/alignment_openfold -11as_A - bfd_uniclust_hits.a3m - mgnify_hits.a3m - pdb70_hits.hhr - uniref90_hits.a3m -11ba_A - bfd_uniclust_hits.a3m - mgnify_hits.a3m - pdb70_hits.hhr - uniref90_hits.a3m -11ba_B -11bg_A -11bg_B -11gs_A ``` The directory tree is similar...

It prints some of the chain_data_cache_entry like this: `{'release_date': '2011-02-09', 'seq': 'MSAGKLPEGWVIAPVSTVTTLIRGVTYKKEQAINYLKDDYLPLIRANNIQNGKFDTTDLVFVPKNLVKESQKISPEDIVIAMSSGSKSVVGKSAHQHLPFECSFGAFCGVLRPEKLIFSGFIAHFTKSSLYRNKISSLSAGANINNIKPASFDLINIPIPPLAEQKIIAEKLDTLLAQVDSTKARFEQIPQILKRFRQAVLGGAVNGKLTEKWRNFEPQHSVFKKLNFESILTELRNGLSSKPNESGVGHPILRISSVRAGHVDQNDIRFLECSESELNRHKLQDGDLLFTRYNGSLEFVGVCGLLKKLQHQNLLYPDKLIRARLTKDALPEYIEIFFSSPSARNAMMNCVKTTSGQKGISGKDIKSQVVLLPPVKEQAEIVRRVEQLFAYADTIEKQVNNALARVNNLTQSILAKAFRGELTAQWRAENPDLISGENSAAALLEKIKAERAASGGKKASRKKS', 'resolution': 18.0, 'cluster_size': -1}` The count number of the `if` block is 26 before it crashes.

The count number of the `if` block is 26 before it crashes.

I print the `p` right after ``` p = get_stochastic_train_filter_prob( chain_data_cache_entry, ) ``` almost all of them are > 0.5 ``` p = 1.0 p = 0.609375 p = 0.880859375...

I'm trying to do that. But we have some difficulties to understand the code, since we'are only computer science research and do not have required protein knowledge.