Andy0422
Andy0422
@HandH1998 Hi,thank you for your kindly help. I encountered another problem with the calibration data, from my test result as following, the results with wikitext2 seems ok, and the results...
> @Andy0422 We used pile for smoothing and wikitext2 for gptq in our paper. But the current code has fixed this issue to use the same dataset for both smoothing...
> @Andy0422 It is probably correct. @HandH1998 One more question, do you employ the online Hadmamad transform before the down_proj or ignore all the online transform in your implementation? If...
I also met this problem... Many thanks!
> I also met this problem... Many thanks! My problem is when I employ torchao to quantization Wan2.1 model, it is incompatible with FSDP.
> [@bys0318](https://github.com/bys0318) I was trying to reproduce the results for deepseek-r1. May I know what value for `max_new_tokens` you used? because the default `128` results in a cutoff on the...
> Hi, for reasoning models such as OpenAI o1 and DeepSeek R1, the w/ CoT setting is not necessary, as these models automatically output their thinking process whether prompted or...
> > I also encountered this problem, did you solve it? Thank you > > port error how to solve it. thx.
> @brisker It it normal that w4a8 first-token is slower than w8a8, since the additional dequant operation (on slower cuda core) of w4a8 slows down tha main loop, even though...
@jerryzh168 is there any update for this issue? Cheers!