mobicham

Results 113 comments of mobicham

Cool! Yeah unfortunately since RWKV doesn't have official support in transformers, there are no guarantees it's gonna work. There's probably a workaround with hqq lib but it's not gonna be...

Hey, have you found a solution to this?

Thank @anijain2305 ! How can I test it? I tried with the nightly (2.6.0.dev20241027+cu121) but I get ```Python RuntimeError: invalid torch.compile stance 'DynamoStance(stance='skip_guard_eval', backend=None)' ``` On separate note, I am...

Hey, yes that's a transformers bug not hqq: https://github.com/huggingface/transformers/issues/41455

* Last time I checked, Marlin only supports symmetric quantization, torchao xdtype implements asymmetric quantization (zero-point), so that's actually an issue, it would need adding zero-point support. The code is...

> > There are also some open PRs in CUTLASS for signed and unsigned int4/int8 multiplication with activations in fp16 [NVIDIA/cutlass#1413](https://github.com/NVIDIA/cutlass/pull/1413) by @alexsamardzic > > @msaroufim [#1413](https://github.com/NVIDIA/cutlass/pull/1413) is `S8 x...

@jerryzh168 the quality tends to be worse with symmetric quantization compared to asymmetric. Much of the quality in linear quantization actually comes from the zero-point not the scaling factor. I...

I see that you removed the pruning config, that will produce incorrect results for group-sizes lower than 128

It's already integrated in torchao: https://github.com/pytorch/ao/releases/tag/v0.5.0 So you just use `quantize_(model, int4_weight_only(group_size, use_hqq=True)` for example