Martyna Patelka
Martyna Patelka
## π Bug With newest version of Docker image (tested on 2024-04-22 ) training with thunder.jit with additional inductor executor gives OOM error. ### To Reproduce Before each testing each...
## π Feature Have a method to annotate pieces of training code (e.g. [benchmark_litgpt](https://github.com/Lightning-AI/lightning-thunder/blob/main/thunder/benchmarks/benchmark_litgpt.py)) so we can easily and automatically compare effectiveness of different compilation methods / versions of Thunder...
## π Bug With newest version of Docker image (tested on pjnl-20240512, on pjnl-20240511 it worked) there are import errors for falcon-7b, Nous-Hermes-13, Llama-3-8B and other models irrespective of used...
## π Bug With newest version of Docker image (tested on 2024-04-28 ) training with thunder.jit on 8xA100 it's not possible to run Platypus-30B and vicuna-33b-v1.3 models. This is the...
I was able to train Llama3-8b model with Thunder for a few steps and then save it. However when I try to use later `litgpt generate` or `litgpt chat` with...
OOM errors for Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-Hermes-13b with FSDP zero3 and 2x8 H100
## π Bug Gemma-7b with FSDP zero3 trained on 2 nodes with 8 H100 each gives OOM error for BS = 2 for both `thunder_cudnn` and `thunder_inductor_cat_cudnn`. The same configuration...
## π Bug After training Llama-3-8b on 8 A100 for 10 iterations with eager mode I printed the model weights: ``` torch_dist.barrier() weights_after_training = benchmark.model.lm_head.weight[:10].data.to(device="cpu", dtype=torch.float32).numpy() if global_rank in [0,...
## π Bug For a few models ( Platypus-30B with FSDP zero3, Gemma7b with DDP and vicuna-33b-v1.3 with FSDP zero3) we get segmentation fault error when trying to use fp8...
## π Bug When running the benchmarks for Mixtral-8x7B-v0.1 for Eager mode we get error: > 0: [rank0]: File "/workspace/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py", line 887, in benchmark_main 0: [rank0]: print(f"Tokens/s: {benchmark.perf_metrics['tokens_per_sec']:.02f}") 0: [rank0]:...
## π Bug When benchmarking model: 'Mixtral-8x7B-v0.1' we get OOM errors even with --checkpoint_activations True The same configurations works for torch.compile. Might be related to [https://github.com/Lightning-AI/lightning-thunder/issues/194](https://github.com/Lightning-AI/lightning-thunder/issues/194). The same issue occurs...