Aflah
Aflah
Thanks Sorry I missed replying earlier
Hey I just happened to stumble on this PR by accident. I noticed that you're using imports from the liger library however I believe it is now deeply integrated within...
Ah nice Yep seems like a easy to fix thing based on the discussion under that issue
Hi Are there any benchmarks I can refer to around speed ups brought by TE in a mixed precision BF16-FP32 training run?
CC: @rasbt @Andrei-Aksionov Just bumping this on your radar as this is a continuation to the OLMo PR
Sure, I'll do that
I tried using the code to process the dataset however it doesn't seem to work for the train set due to size issues. Is there a way to reduce how...
A simple fix that I'm using is to create a symlink with my NFS where I have more storage with /tmp/data and then running it. It seems to run for...
Hey @Andrei-Aksionov @rasbt I was trying to set up a multinode run via SLURM and was testing this on 2 nodes with ethernet based interconnect however the init fails -...
nvidia-smi on the nodes (before timeout based crash) -   So one of the nodes doesn't really load anything