Martyna Patelka

Results 18 issues of Martyna Patelka

## πŸ› Bug With newest version of Docker image (tested on 2024-04-22 ) training with thunder.jit with additional inductor executor gives OOM error. ### To Reproduce Before each testing each...

bug
memory use
mixology

## πŸš€ Feature Have a method to annotate pieces of training code (e.g. [benchmark_litgpt](https://github.com/Lightning-AI/lightning-thunder/blob/main/thunder/benchmarks/benchmark_litgpt.py)) so we can easily and automatically compare effectiveness of different compilation methods / versions of Thunder...

enhancement

## πŸ› Bug With newest version of Docker image (tested on pjnl-20240512, on pjnl-20240511 it worked) there are import errors for falcon-7b, Nous-Hermes-13, Llama-3-8B and other models irrespective of used...

bug

## πŸ› Bug With newest version of Docker image (tested on 2024-04-28 ) training with thunder.jit on 8xA100 it's not possible to run Platypus-30B and vicuna-33b-v1.3 models. This is the...

bug
distributed
mixology

I was able to train Llama3-8b model with Thunder for a few steps and then save it. However when I try to use later `litgpt generate` or `litgpt chat` with...

triage review

## πŸ› Bug Gemma-7b with FSDP zero3 trained on 2 nodes with 8 H100 each gives OOM error for BS = 2 for both `thunder_cudnn` and `thunder_inductor_cat_cudnn`. The same configuration...

bug
memory use
mixology

## πŸ› Bug After training Llama-3-8b on 8 A100 for 10 iterations with eager mode I printed the model weights: ``` torch_dist.barrier() weights_after_training = benchmark.model.lm_head.weight[:10].data.to(device="cpu", dtype=torch.float32).numpy() if global_rank in [0,...

bug
distributed
mixology

## πŸ› Bug For a few models ( Platypus-30B with FSDP zero3, Gemma7b with DDP and vicuna-33b-v1.3 with FSDP zero3) we get segmentation fault error when trying to use fp8...

cudnn
mixology

## πŸ› Bug When running the benchmarks for Mixtral-8x7B-v0.1 for Eager mode we get error: > 0: [rank0]: File "/workspace/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py", line 887, in benchmark_main 0: [rank0]: print(f"Tokens/s: {benchmark.perf_metrics['tokens_per_sec']:.02f}") 0: [rank0]:...

mixology

## πŸ› Bug When benchmarking model: 'Mixtral-8x7B-v0.1' we get OOM errors even with --checkpoint_activations True The same configurations works for torch.compile. Might be related to [https://github.com/Lightning-AI/lightning-thunder/issues/194](https://github.com/Lightning-AI/lightning-thunder/issues/194). The same issue occurs...

bug
triage review
mixology