Martyna Patelka

Results 25 comments of Martyna Patelka

I tested it and `benchmark.model.get_parameter('lm_head.weight')[:10]` still gives shape [10, 4096] for Thunder and [10] for Eager. Also it is expected that the values of parameters are different between Thunder and...

In case of Thunder it's Thunder module: thunder.core.module.ThunderModule. IN case of Eager it's the original module.

The value of loss is also different between Thunder and Eager: **Eager:** > iter 0: loss 11.9375, iter time: 6618.87ms, t: 8192 > iter 1: loss 9.8750, iter time: 1466.43ms,...

I reproduced the issue manually on a cluster - here you can find full logs: [slurm-930652.txt](https://github.com/user-attachments/files/16247072/slurm-930652.txt)

Hi all! I wrote recently that the issue it fixed - but I checked it only for one model (Gemma-7b). The error is still present (checked on INTERNAL_IMAGE:pjnl-20240830_ for Mistral-7B-v0.2,...

In the recent run the issue was present only for 3 cases and 2 models ('CodeLlama-34b-hf', 'falcon-40b') and I checked that it's not present at all for 2 cases (one...

Hi! So this issue was present recently in 7 cases, all are using fp8, below are reproduction instructions: ``` Please use: 1 node(s), each with 8 GPUs. Image "INTERNAL_IMAGE:pjnl-20241011" Training...

For the most recent set of issues I used this script to reproduce the error: ``` #!/bin/bash #SBATCH -A YOUR_ACCOUNT #SBATCH -p batch #SBATCH -J YOUR_JOB_NAME #SBATCH -N 2 #SBATCH...

Actually we see the same results for other models. Is one issue enough to track all of them? Below are the results: ![image](https://github.com/user-attachments/assets/edf630e2-67b3-4f1b-89b1-95d215678119)

Hi! Please let me know when we will be ready to check FSDP 2 again :)