openfold icon indicating copy to clipboard operation
openfold copied to clipboard

About DummyData

Open GuoxiaWang opened this issue 2 years ago • 4 comments

Can you upload a DummyData of each training config, e.g. initial_training, finetuning, model_1, model_2, model_3, model_4, model_5, model_1_ptm, model_2_ptm,, model_3_ptm, model_4_ptm, model_5_ptm.

GuoxiaWang avatar May 13 '22 06:05 GuoxiaWang

Shapewise, the inputs for each of these models differ only by the size of the input crop. As such, I'm not sure how instructive it would be to upload separate sample batches for each preset. Is there a particular use you have in mind?

gahdritz avatar May 15 '22 21:05 gahdritz

@gahdritz I fixed.

But, I suggest that you can upload a tiny, preprocessed, and complete training dataset.

e.g. we can run directly the training scripts that the preprocessed and tiny data all we need.

mmcif_dir, alignment_dir, template_mmcif_dir, mmcif_cache.json, chain_data_cache.json

python3 train_openfold.py mmcif_dir/ alignment_dir/ template_mmcif_dir/ \
    2021-10-10 \ 
    --template_release_dates_cache_path mmcif_cache.json \ 
    --precision 16 \
    --gpus 8 --replace_sampler_ddp=True \
    --seed 42 \ # in multi-gpu settings, the seed must be specified
    --deepspeed_config_path deepspeed_config.json \
    --checkpoint_every_epoch \
    --resume_from_ckpt ckpt_dir/ \
    --train_chain_data_cache_path chain_data_cache.json

GuoxiaWang avatar May 16 '22 01:05 GuoxiaWang

In a couple of weeks (for real this time) we'll be uploading a couple of hundred thousand processed MSAs and template hits along with original OpenFold weights. Stay tuned!

gahdritz avatar May 16 '22 23:05 gahdritz

@gahdritz Ok, thx.

But, I still suggest that you can upload tiny processed MSAs and template hits along with the original OpenFold weights dataset.

Pros:

  • He/She can quickly run training scripts and learn something from your code for the newer to OpenFold.
  • If someone only tests performance and he/she wants to make a contribution, just only download a tiny_train.tgz and do not run all unnecessary data-related scripts.

GuoxiaWang avatar May 17 '22 02:05 GuoxiaWang