Martyna Patelka issues

Results 18 issues of


                                            Martyna Patelka

AssertionError: t54580_out for rematerialisation for Gemma-2-27b and other models.

## 🐛 Bug When running benchmarking script with `--checkpoint_activations True` we get: > AssertionError: t54580_out for rematerialisation This issue is present for the following models: 'Llama-3-70B', 'Gemma-2-27b', 'longchat-13b-16k', 'Mistral-7B-v0.2', 'vicuna-7b-v1.5-16k',...

bug

mixology

OOM for training on 4 nodes for falcon-40b and vicuna-33b-v1.3

## 🐛 Bug This might be related to [old OOM issue](https://github.com/Lightning-AI/lightning-thunder/issues/474), but the models and # nodes is different, so I decided to create another one. We get OOM error,...

memory use

mixology

AttributeError: \'Float8Tensor\' object has no attribute \'_fp8_attrs\'

## 🐛 Bug When using DDP with Dynamo+Thunder we get: > AttributeError: \'Float8Tensor\' object has no attribute \'_fp8_attrs\' This issue affects the following models: 'dolly-v2-3b', 'Mistral-7B-v0.1', 'tiny-llama-1.1b', 'stablecode-completion-alpha-3b', 'Phi-3-mini-4k-instruct', 'falcon-7b'...

mixology

ThunderFX fails with FP8 and Activation Checkpointing

## 🐛 Bug When training models: 'vicuna-7b-v1.5-16k', 'longchat-13b-16k', 'Mistral-7B-v0.2', 'falcon-180B', 'Llama-3-70B', 'CodeLlama-34b-hf' with FSDP and FP8 we get KeyError: 'scaling_fwd'. This might be also issue with Transformer Engine,, so I'm...

TransformerEngine

mixology

thunderfx

Investigate the difference in speedup between Llama3-8b and Mistral-7B-v0.1

## 🚀 Feature Make Thunder + Mistral-7B-v0.1 as fast as Thunder + Llama3-8b (comparing to Eager mode). ### Motivation Below are data for: * Llama3-8b: ![image](https://github.com/user-attachments/assets/953bf346-8192-400b-adea-74597c3bbbde) * Mistral-7B-v0.1: ![image](https://github.com/user-attachments/assets/a4665d7a-7a7d-4d20-bf6b-6720d710d846) *...

enhancement

performance

Thunder and ThunderFX are slower than torch.compile for FP8 and falcon-7b and other models

## 🐛 Bug As can be seen below Thunder is slower than torch.compile for single gpu training of falcon-7b: ![image](https://github.com/user-attachments/assets/5c5254b5-373a-408a-83fd-29e2ba7d53d2) Below are results for ThunderFX for multi-gpu training : ![image](https://github.com/user-attachments/assets/be90e478-e63a-4244-a3a5-1372dbed4750)...

OOM for ThunderFX and Thunder with DDP for Mistral-7B-v0.1

## 🐛 Bug When running Mistral-7B-v0.1 we get OOM error. The same configuration passes for torch.compile. ### To Reproduce Steps to reproduce the behavior: Please use: 1 node(s), each with...

thunderfx

New OOM errors for ThunderFX and FSDP

## 🐛 Bug Recently we got OOM errors causing failures of Gemma-2-2b (in canary runs) and distributed training of stablecode-completion-alpha-3b. ### To Reproduce Please use: 1 node(s), each with 8...

thunderfx