Joe Cummings comments

Results 278 comments of


                                            Joe Cummings

Add multiprocess dataset packing

It looks like Axolotl makes use of the map and filter functions on the Hugging Face Dataset abstraction, which is pretty neat. That way they can just set a default...

Add multiprocess dataset packing

@andrewkho But how long would it take to get streaming packing in vs. the straightforward torchdata.nodes approach for map-style datasets? Map-style will always need to be supported, so it wouldn't...

Add multiprocess dataset packing

> @joecummings agree that both are still necessary. It's going to be similar foundational work: land some version of streaming packer (could be in torchdata nodes), and then setting up...

what should I do if I want to improve the performance of hellaswag?

Did you fine-tune the 14B model on your desired dataset first? That's an important pre-step to knowledge distillation.

what should I do if I want to improve the performance of hellaswag?

> Sorry I didn't, I mistakenly thought it was not important. All good - give that a go and LMK how it works after re-evaluating

v0.3 regression, full_finetune_distributed slower ?

Can you share a few more details around which models you're using, size of dataset, machine type? Off the very top of my head, not sure what would be going...

v0.3 regression, full_finetune_distributed slower ?

Hey @Delaunay - I looked into this and was able to repro! Unfortunately, still digging into the root cause, but a quick fix is to upgrade your PyTorch version to...

v0.3 regression, full_finetune_distributed slower ?

This can be closed b/c we have a new stable release that should fix this issue.

Loss not going down for fine-tuning Llama3-8B on C4

@joecummings to verify that this is still true

PTQ for `generate_v2`

@felipemello1 @ebsmothers Will this not pass on PyTorch 2.5 b/c of the issue with CUDNN? This test passes locally on PyTorch v2.5.1. Do we know when the patch will be...