Andy0422
Andy0422
> see some examples in https://github.com/pytorch/ao/blob/main/test/float8/test_fsdp.py > > we'll be using `quantize_` API everywhere, but maybe not yet for > > [ao/test/float8/test_fsdp.py](https://github.com/pytorch/ao/blob/137b0795acb3282ce622948b1537e20914186eea/test/float8/test_fsdp.py#L88) > > Line 88 in [137b079](/pytorch/ao/commit/137b0795acb3282ce622948b1537e20914186eea) > >...
> @cli99 We share the wikitext2 PPL results below. > > Granularity Method Llama2-7b Llama2-13b Llama3-8b > per-channel smooth + GPTQ 5.9683 5.2091 7.4474 > per-channel rotation + GPTQ 5.6872...
> when `max_seq_lengths` is set to 2048, the program will hang on a `while true` loop forever. 4096 or beyond works normally Yes, 4096 is the shortest length in RULER.
> 没改pred.py的逻辑,只修改为本地读取模型,使用qwen2.5-instruct,但是推理多次,总有100多条样本答案为空,请问这是为啥啊? 同样的问题