ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

Feature: Add the ability to log model artifacts to Weights & Biases

Open parambharat opened this issue 2 years ago • 3 comments

Patch description

This PR adds the ability to log model artifacts to Weights & Biases. Logging model artifacts allows complete reproducibility of the experiments that resulted in the trained model and also allows to reuse the model for finetuning, evaluation and collaboration.

Testing steps Run the demo training example with the optional arguments to log to wandb.

parlai train_model -t personachat \
-m transformer/ranker \
-mf /tmp/model_tr6 \
--n-layers 1 \
--embedding-size 300 \
--ffn-size 600 \
--n-heads 4 \
--num-epochs 2 \
-veps 0.25 \
-bs 64 \
-lr 0.001 \
--dropout 0.1 \
--embedding-type fasttext_cc \
--candidates batch \
--wandb-log True \
--wandb-project parlai \
--wandb-log-model True

This should log the model, dictionary, and the trainstats file to Weights & Biases. Here is an example of the logged artifacts from the above command.

parambharat avatar Aug 05 '22 10:08 parambharat

@klshuster : I've merged upstream main into the branch. To clarify, logging a model as an Artifact not only adds the filenames but also pushes these files as model Artifacts to wandb. The artifacts can then be reused by resuming scripts and furthermore shared with team collaborators of a project. See here for more details on the core concepts of artifacts.

parambharat avatar Aug 09 '22 06:08 parambharat

Thanks for the clarification; in that case, let's leave the default as False. Are there any storage issues for larger models? e.g., a 3B model with optimizer state could take up quite a bit of space

klshuster avatar Aug 09 '22 18:08 klshuster

AFAIK, the user limit is 100GB but there are no limits on the model size in practice when logging artifacts.

parambharat avatar Aug 12 '22 11:08 parambharat