multilingual-modeling icon indicating copy to clipboard operation
multilingual-modeling copied to clipboard

BLOOM+1: Adapting BLOOM model to support a new unseen language

Results 19 multilingual-modeling issues
Sort by recently updated
recently updated
newest added

Paper: https://arxiv.org/abs/2012.07463 Code: https://github.com/dguo98/DiffPruning

Current changes: just some unused / commented out code from `madx_run_clm.py`. There is more, but I was not certain why certain parts are commented out. We'll need to refactor the...

Paper: https://arxiv.org/pdf/2111.09839.pdf Code: https://github.com/varunnair18/FISH/blob/main/transformers/examples/text-classification/run_glue_sparse_update.py This will be relatively easy to add to our code.

The following info is for Bloom-1.3B and embedding-and-MADX-adapters (with replace strategy) with the default bottleneck reduction size of 16. ``` Total frozen parameters: 1208602624 Total trainable parameters: 24979456 Total emb...

Prefix-Tuning is already supported in adapter-transformer. Just need to make it work for BLOOM.

I am getting different results by running training/eval together and separately. Rerunning evaluation after training (by removing `--do_train`) gives me a better result than running training+eval together.

The idea of this issue to modify the [megatron-deepspeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed) repository code that we use for training all models. In order to track the progress of validation loss on several validaiton...