Vitaliy Chiley

Results 64 comments of Vitaliy Chiley

you could start with this img: `mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04` from [here](https://docs.mosaicml.com/projects/composer/en/latest/getting_started/installation.html#pytorch-images)

If you are in `/home/llm-foundry/scripts/train/` you should run `python ../../llmfoundry/data/text_data.py --local_path ./my-copy-c4 --split val_small` If you are in `/home/llm-foundry`, you should run `python llmfoundry/data/text_data.py --local_path ./my-copy-c4 --split val_small` You can...

Can you add the config you are running?

``` KeyError: ('2-.-0-.-0-842f0fbd42a6607893f7134cdd9d16f2-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-f24b6aa9b101a518b6a4a6bddded372e-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32',...

This [error](https://github.com/mosaicml/llm-foundry/issues/143#issuecomment-1552638882) tells you the issue. Your dataset is outputting data with left padding. MPT does not support training with left padding. This is a dataset issue.

~~blocking composer pr: https://github.com/mosaicml/composer/pull/2229 (waiting for new composer img)~~ merged

Issue: using Torch2 checkpointing caused the [run](https://wandb.ai/mosaic-ml/torch2_test/runs/nscry8zv) to crash ``` Eval metrics/eval/LanguagePerplexity: 60.0989 Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 254, in main(cfg) File "/llm-foundry/scripts/train/train.py", line 243, in main...

With `parameters['fsdp_config']['use_orig_params'] = False` ckpt is not broken and everything runs fine.

Running from [this branch](https://github.com/mosaicml/llm-foundry/compare/main...vchiley:llm-foundry:torch2_test?expand=1) I still hit the same ckpt issue (with `attn_impl: torch`) when running using `mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04` as the base img. If I run in interactive mode and install...

@eracah identified the issue as an issue with how optimizers were implemented in composer