Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

update converted models to include tokenizer files

Open stas00 opened this issue 4 years ago • 0 comments

Update released model files to include

  1. correct tokenizer files (t5-small or gpt2):
  2. fill out the config.tokenizer_class

HUB:

  • [ ] https://huggingface.co/bigscience/gpt2-13b-en
  • [ ] https://huggingface.co/bigscience/gpt2-1b3-en
  • [ ] https://huggingface.co/bigscience/gpt2-350m-en
  • [ ] https://huggingface.co/bigscience/tr3e-1B3-c4-checkpoints
  • [ ] https://huggingface.co/bigscience/tr3d-1B3-oscar-checkpoints
  • [ ] https://huggingface.co/bigscience/tr3m-1B3-pile-checkpoints

GCS:

  • [ ] gs://bigscience-backups/tr1-13B/

This will be automated for the future by https://github.com/bigscience-workshop/Megatron-DeepSpeed/issues/126 - but for now do it manually since it's just 2 sets of files.

stas00 avatar Oct 06 '21 01:10 stas00