Chan Kha Vu comments

Results 9 comments of


                                            Chan Kha Vu

What is the recommended way of using webdataset with pytorch-lightning and ddp?

@harpone @isty2e Were you able to get a working solution for PyTorch XLA + TPU + Lightning?

What is the recommended way of using webdataset with pytorch-lightning and ddp?

@harpone I was able to adapt WebDataset for PyTorch-Lightning + TPU. However, I'm having mysterious runtime errors (that doesn't related to webdataset in any way), so I'm also giving up...

where are the original directories ...?

Seems like I renamed `configs/toy_experiments` to `configs/trinist` (i.e. "Triple *-NIST". I created this framework as part of my thesis, definitely will come back to this one soon. Quick note -...

Quantizing DeepSeek-R1-Distill-Qwen-7B produces garbage and repetitive tokens

@FeSens Hi, were you able to fix this? @casper-hansen would be awesome if you could please share how/which autoawq version you used while quantizing the DeepSeek-Distill models!

Quantizing DeepSeek-R1-Distill-Qwen-7B produces garbage and repetitive tokens

@tvmsandy33 yes, I tried to supply some reasoning traces produced by the BF16 model. The prompt template, system prompt, etc. I kept the same as during evaluation time (where it...

Keras - Supporting load/save models and weights to Google Storage

What is the status of this one so far? Would love to jump in.

[PPO] feat: Add LoRA support for PPO

@skepsun I suppose you have this issue after merging this branch with Main locally? This is because in the sharding manager, the dict holds pointers to un-sharded tensors, but when...

Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32.

I wish I saw this thread sooner, would've saved 30 mins debugging! May I ask why FP32 is enforced as default, instead of just using whatever dtype of the model?...

TIR实验结果异常

Hi, same for me, for both 1.5B and 7B. My numbers so far: | Model | MATH CoT | MATH CoT (maj@8) | MATH TIR | |:--------:|:---------------:|:---------------------------:|:--------------:| | Qwen2.5-1.5B-Instruct |...