Aflah

Results 125 comments of Aflah

Thanks Sorry I missed replying earlier

Hey I just happened to stumble on this PR by accident. I noticed that you're using imports from the liger library however I believe it is now deeply integrated within...

Ah nice Yep seems like a easy to fix thing based on the discussion under that issue

Hi Are there any benchmarks I can refer to around speed ups brought by TE in a mixed precision BF16-FP32 training run?

CC: @rasbt @Andrei-Aksionov Just bumping this on your radar as this is a continuation to the OLMo PR

I tried using the code to process the dataset however it doesn't seem to work for the train set due to size issues. Is there a way to reduce how...

A simple fix that I'm using is to create a symlink with my NFS where I have more storage with /tmp/data and then running it. It seems to run for...

Hey @Andrei-Aksionov @rasbt I was trying to set up a multinode run via SLURM and was testing this on 2 nodes with ethernet based interconnect however the init fails -...

nvidia-smi on the nodes (before timeout based crash) - ![image](https://github.com/user-attachments/assets/24c6e37f-739d-4136-986b-3cd5d355ffc4) ![image](https://github.com/user-attachments/assets/4fffca8f-8dd9-499a-acdc-309bca003bf5) So one of the nodes doesn't really load anything