metaseq
metaseq copied to clipboard
Repo for external large-scale work
**Patch Description** Describe your changes **Testing steps** Describe how you tested your changes
## Issue Training requires flattened models, any MP and FSDP Inference requires unflattened models with FSDP 1 We wanted AML jobs which train model (which produces Flattened checkpoint output), reshards...
⚠️ This PR likely won't work directly but I wanted to share code from our fork that may be modified to integrate ⚠️ ## Issue (This may not be 100%...
⚠️ This PR is not intended to be merged directly, but to demonstrate documentation from our fork ⚠️ ## Issue Current documentation in Metaseq repo is very minimal. - Given...
# Issues ## 1 Inconsistent checkpoint filenames saved by trainer In our pipeline we often have sequence of steps such as (train, reshard/unflatten, evaluate). The output files of the training...
## ❓ Questions and Help ### Before asking: - [x] search the issues. - [x] search the docs. #### What is your question? The OPT-IML paper evaluates the models on...
**Patch Description** Describe your changes **Testing steps** Describe how you tested your changes
This addresses Issue 642. When the stop token is \n\n the generation should stop after generation two new lines. Check the previous token that is generated and if it is...
## 🐛 Bug I use the script as follow: CUDA_VISIBLE_DEVICES="0, 1, 2, 3" metaseq-train --task streaming_language_modeling \ data/pile-test/ \ --num-workers 4 \ --reset-dataloader \ --vocab-filename ./vocab/gpt2-vocab.json \ --merges-filename ./vocab/gpt2-merges.txt \...
There are the ways to reshard the trained model to inference model, but how to retrain the model from the consolidated model ? (like llama)