Martyna Patelka issues

Results 18 issues of


                                            Martyna Patelka

Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT

## 🐛 Bug With newest version of Docker image (tested on 2024-04-22 ) training with thunder.jit with additional inductor executor gives OOM error. ### To Reproduce Before each testing each...

bug

memory use

mixology

have a method to compare speed of different parts of training between compilation backends

## 🚀 Feature Have a method to annotate pieces of training code (e.g. [benchmark_litgpt](https://github.com/Lightning-AI/lightning-thunder/blob/main/thunder/benchmarks/benchmark_litgpt.py)) so we can easily and automatically compare effectiveness of different compilation methods / versions of Thunder...

enhancement

No module named 'flax' when using thunder/benchmarks/benchmark_litgpt.py

## 🐛 Bug With newest version of Docker image (tested on pjnl-20240512, on pjnl-20240511 it worked) there are import errors for falcon-7b, Nous-Hermes-13, Llama-3-8B and other models irrespective of used...

bug

Timeout for Platypus-30B and Thunder compile

## 🐛 Bug With newest version of Docker image (tested on 2024-04-28 ) training with thunder.jit on 8xA100 it's not possible to run Platypus-30B and vicuna-33b-v1.3 models. This is the...

bug

distributed

mixology

Models trained with FSDP + Thunder doesn't work with litgpt chat

I was able to train Llama3-8b model with Thunder for a few steps and then save it. However when I try to use later `litgpt generate` or `litgpt chat` with...

triage review

OOM errors for Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-Hermes-13b with FSDP zero3 and 2x8 H100

## 🐛 Bug Gemma-7b with FSDP zero3 trained on 2 nodes with 8 H100 each gives OOM error for BS = 2 for both `thunder_cudnn` and `thunder_inductor_cat_cudnn`. The same configuration...

bug

memory use

mixology

Different shapes, values of model weights and losses between FSDP training in Eager mode and with Thunder

## 🐛 Bug After training Llama-3-8b on 8 A100 for 10 iterations with eager mode I printed the model weights: ``` torch_dist.barrier() weights_after_training = benchmark.model.lm_head.weight[:10].data.to(device="cpu", dtype=torch.float32).numpy() if global_rank in [0,...

bug

distributed

mixology

Segmentation fault for fp8 and thunder_cudnn

## 🐛 Bug For a few models ( Platypus-30B with FSDP zero3, Gemma7b with DDP and vicuna-33b-v1.3 with FSDP zero3) we get segmentation fault error when trying to use fp8...

cudnn

mixology

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.format

## 🐛 Bug When running the benchmarks for Mixtral-8x7B-v0.1 for Eager mode we get error: > 0: [rank0]: File "/workspace/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py", line 887, in benchmark_main 0: [rank0]: print(f"Tokens/s: {benchmark.perf_metrics['tokens_per_sec']:.02f}") 0: [rank0]:...

mixology

OOM with rematerialization when torch.compile works

## 🐛 Bug When benchmarking model: 'Mixtral-8x7B-v0.1' we get OOM errors even with --checkpoint_activations True The same configurations works for torch.compile. Might be related to [https://github.com/Lightning-AI/lightning-thunder/issues/194](https://github.com/Lightning-AI/lightning-thunder/issues/194). The same issue occurs...

bug

triage review

mixology