Martyna Patelka comments

Results 25 comments of


                                            Martyna Patelka

Different shapes, values of model weights and losses between FSDP training in Eager mode and with Thunder

I tested it and `benchmark.model.get_parameter('lm_head.weight')[:10]` still gives shape [10, 4096] for Thunder and [10] for Eager. Also it is expected that the values of parameters are different between Thunder and...

Different shapes, values of model weights and losses between FSDP training in Eager mode and with Thunder

In case of Thunder it's Thunder module: thunder.core.module.ThunderModule. IN case of Eager it's the original module.

Different shapes, values of model weights and losses between FSDP training in Eager mode and with Thunder

The value of loss is also different between Thunder and Eager: **Eager:** > iter 0: loss 11.9375, iter time: 6618.87ms, t: 8192 > iter 1: loss 9.8750, iter time: 1466.43ms,...

Segmentation fault for fp8 and thunder_cudnn

I reproduced the issue manually on a cluster - here you can find full logs: [slurm-930652.txt](https://github.com/user-attachments/files/16247072/slurm-930652.txt)

Segmentation fault for fp8 and thunder_cudnn

Hi all! I wrote recently that the issue it fixed - but I checked it only for one model (Gemma-7b). The error is still present (checked on INTERNAL_IMAGE:pjnl-20240830_ for Mistral-7B-v0.2,...

Segmentation fault for fp8 and thunder_cudnn

In the recent run the issue was present only for 3 cases and 2 models ('CodeLlama-34b-hf', 'falcon-40b') and I checked that it's not present at all for 2 cases (one...

Segmentation fault for fp8 and thunder_cudnn

Hi! So this issue was present recently in 7 cases, all are using fp8, below are reproduction instructions: ``` Please use: 1 node(s), each with 8 GPUs. Image "INTERNAL_IMAGE:pjnl-20241011" Training...

Segmentation fault for fp8 and thunder_cudnn

For the most recent set of issues I used this script to reproduce the error: ``` #!/bin/bash #SBATCH -A YOUR_ACCOUNT #SBATCH -p batch #SBATCH -J YOUR_JOB_NAME #SBATCH -N 2 #SBATCH...

Thunder and ThunderFX are slower than torch.compile for FP8 and falcon-7b and other models

Actually we see the same results for other models. Is one issue enough to track all of them? Below are the results: ![image](https://github.com/user-attachments/assets/edf630e2-67b3-4f1b-89b1-95d215678119)

FSDP2 & Thunder looks memory hungrier than `thunder.distributed.fsdp` for certain models

Hi! Please let me know when we will be ready to check FSDP 2 again :)