Joe Cummings
Joe Cummings
It looks like Axolotl makes use of the map and filter functions on the Hugging Face Dataset abstraction, which is pretty neat. That way they can just set a default...
@andrewkho But how long would it take to get streaming packing in vs. the straightforward torchdata.nodes approach for map-style datasets? Map-style will always need to be supported, so it wouldn't...
> @joecummings agree that both are still necessary. It's going to be similar foundational work: land some version of streaming packer (could be in torchdata nodes), and then setting up...
Did you fine-tune the 14B model on your desired dataset first? That's an important pre-step to knowledge distillation.
> Sorry I didn't, I mistakenly thought it was not important. All good - give that a go and LMK how it works after re-evaluating
Can you share a few more details around which models you're using, size of dataset, machine type? Off the very top of my head, not sure what would be going...
Hey @Delaunay - I looked into this and was able to repro! Unfortunately, still digging into the root cause, but a quick fix is to upgrade your PyTorch version to...
This can be closed b/c we have a new stable release that should fix this issue.
@joecummings to verify that this is still true
@felipemello1 @ebsmothers Will this not pass on PyTorch 2.5 b/c of the issue with CUDNN? This test passes locally on PyTorch v2.5.1. Do we know when the patch will be...