Martyna Patelka

Results 18 issues of Martyna Patelka

## πŸ› Bug When running benchmarking script with `--checkpoint_activations True` we get: > AssertionError: t54580_out for rematerialisation This issue is present for the following models: 'Llama-3-70B', 'Gemma-2-27b', 'longchat-13b-16k', 'Mistral-7B-v0.2', 'vicuna-7b-v1.5-16k',...

bug
mixology

## πŸ› Bug This might be related to [old OOM issue](https://github.com/Lightning-AI/lightning-thunder/issues/474), but the models and # nodes is different, so I decided to create another one. We get OOM error,...

memory use
mixology

## πŸ› Bug When using DDP with Dynamo+Thunder we get: > AttributeError: \'Float8Tensor\' object has no attribute \'_fp8_attrs\' This issue affects the following models: 'dolly-v2-3b', 'Mistral-7B-v0.1', 'tiny-llama-1.1b', 'stablecode-completion-alpha-3b', 'Phi-3-mini-4k-instruct', 'falcon-7b'...

mixology

## πŸ› Bug When training models: 'vicuna-7b-v1.5-16k', 'longchat-13b-16k', 'Mistral-7B-v0.2', 'falcon-180B', 'Llama-3-70B', 'CodeLlama-34b-hf' with FSDP and FP8 we get KeyError: 'scaling_fwd'. This might be also issue with Transformer Engine,, so I'm...

TransformerEngine
mixology
thunderfx

## πŸš€ Feature Make Thunder + Mistral-7B-v0.1 as fast as Thunder + Llama3-8b (comparing to Eager mode). ### Motivation Below are data for: * Llama3-8b: ![image](https://github.com/user-attachments/assets/953bf346-8192-400b-adea-74597c3bbbde) * Mistral-7B-v0.1: ![image](https://github.com/user-attachments/assets/a4665d7a-7a7d-4d20-bf6b-6720d710d846) *...

enhancement
performance

## πŸ› Bug As can be seen below Thunder is slower than torch.compile for single gpu training of falcon-7b: ![image](https://github.com/user-attachments/assets/5c5254b5-373a-408a-83fd-29e2ba7d53d2) Below are results for ThunderFX for multi-gpu training : ![image](https://github.com/user-attachments/assets/be90e478-e63a-4244-a3a5-1372dbed4750)...

## πŸ› Bug When running Mistral-7B-v0.1 we get OOM error. The same configuration passes for torch.compile. ### To Reproduce Steps to reproduce the behavior: Please use: 1 node(s), each with...

thunderfx

## πŸ› Bug Recently we got OOM errors causing failures of Gemma-2-2b (in canary runs) and distributed training of stablecode-completion-alpha-3b. ### To Reproduce Please use: 1 node(s), each with 8...

thunderfx