multilingual-modeling issues

[Low Prio] Implement Diff-Pruning

Paper: https://arxiv.org/abs/2012.07463 Code: https://github.com/dguo98/DiffPruning

[WIP] Refactor madx_run_clm.py

7

Current changes: just some unused / commented out code from `madx_run_clm.py`. There is more, but I was not certain why certain parts are commented out. We'll need to refactor the...

haileyschoelkopf

[Lower Priority] Implement "Fisher Induced Sparse uncHanging (FISH) Mask"

Paper: https://arxiv.org/pdf/2111.09839.pdf Code: https://github.com/varunnair18/FISH/blob/main/transformers/examples/text-classification/run_glue_sparse_update.py This will be relatively easy to add to our code.

haileyschoelkopf

Make generation task (XLSUM) work for decoder-only model and follow the original setting.

yongzx

Trainable Parameters in madx_clm_run.py are incorrect for `extend` strategy.

yongzx

Control Extra Params (use Adapter 16x reduction size as control)

The following info is for Bloom-1.3B and embedding-and-MADX-adapters (with replace strategy) with the default bottleneck reduction size of 16. ``` Total frozen parameters: 1208602624 Total trainable parameters: 24979456 Total emb...

yongzx

Implement Prefix-Tuning.

Prefix-Tuning is already supported in adapter-transformer. Just need to make it work for BLOOM.

yongzx

Inconsistent Evaluation Results

I am getting different results by running training/eval together and separately. Rerunning evaluation after training (by removing `--do_train`) gives me a better result than running training+eval together.

yongzx

Adding Language specific validation sets to deepspeed

4

The idea of this issue to modify the [megatron-deepspeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed) repository code that we use for training all models. In order to track the progress of validation loss on several validaiton...

hadyelsahar

multilingual-modeling
multilingual-modeling copied to clipboard

Metadata

[Low Prio] Implement Diff-Pruning

[WIP] Refactor madx_run_clm.py

[Lower Priority] Implement "Fisher Induced Sparse uncHanging (FISH) Mask"

Make generation task (XLSUM) work for decoder-only model and follow the original setting.

Trainable Parameters in madx_clm_run.py are incorrect for `extend` strategy.

Control Extra Params (use Adapter 16x reduction size as control)

Implement Prefix-Tuning.

Inconsistent Evaluation Results

Adding Language specific validation sets to deepspeed

← Metadata

Owner

Metadata

multilingual-modeling multilingual-modeling copied to clipboard

Metadata

← Metadata

Owner

Metadata

multilingual-modeling
multilingual-modeling copied to clipboard