Chan Kha Vu
Chan Kha Vu
@harpone @isty2e Were you able to get a working solution for PyTorch XLA + TPU + Lightning?
@harpone I was able to adapt WebDataset for PyTorch-Lightning + TPU. However, I'm having mysterious runtime errors (that doesn't related to webdataset in any way), so I'm also giving up...
Seems like I renamed `configs/toy_experiments` to `configs/trinist` (i.e. "Triple *-NIST". I created this framework as part of my thesis, definitely will come back to this one soon. Quick note -...
@FeSens Hi, were you able to fix this? @casper-hansen would be awesome if you could please share how/which autoawq version you used while quantizing the DeepSeek-Distill models!
@tvmsandy33 yes, I tried to supply some reasoning traces produced by the BF16 model. The prompt template, system prompt, etc. I kept the same as during evaluation time (where it...
What is the status of this one so far? Would love to jump in.
@skepsun I suppose you have this issue after merging this branch with Main locally? This is because in the sharding manager, the dict holds pointers to un-sharded tensors, but when...
I wish I saw this thread sooner, would've saved 30 mins debugging! May I ask why FP32 is enforced as default, instead of just using whatever dtype of the model?...
Hi, same for me, for both 1.5B and 7B. My numbers so far: | Model | MATH CoT | MATH CoT (maj@8) | MATH TIR | |:--------:|:---------------:|:---------------------------:|:--------------:| | Qwen2.5-1.5B-Instruct |...