mobicham comments

Results 113 comments of


                                            mobicham

quantization with transformers of RWKV/v6-Finch-1B6-HF

Cool! Yeah unfortunately since RWKV doesn't have official support in transformers, there are no guarantees it's gonna work. There's probably a workaround with hqq lib but it's not gonna be...

quantization with transformers of RWKV/v6-Finch-1B6-HF

Hey, have you found a solution to this?

[rfc][dont merge] Use the skip_guard_eval stance to remove torch.compile guard overhead

Thank @anijain2305 ! How can I test it? I tried with the nightly (2.6.0.dev20241027+cu121) but I get ```Python RuntimeError: invalid torch.compile stance 'DynamoStance(stance='skip_guard_eval', backend=None)' ``` On separate note, I am...

[rfc][dont merge] Use the skip_guard_eval stance to remove torch.compile guard overhead

Understood @anijain2305 , thank you !

lm_eval | RuntimeError: expected mat1 and mat2 to have the same dtype,

Hey, yes that's a transformers bug not hqq: https://github.com/huggingface/transformers/issues/41455

[RFC] Which low bit CUDA kernels should we merge or write?

* Last time I checked, Marlin only supports symmetric quantization, torchao xdtype implements asymmetric quantization (zero-point), so that's actually an issue, it would need adding zero-point support. The code is...

[RFC] Which low bit CUDA kernels should we merge or write?

> > There are also some open PRs in CUTLASS for signed and unsigned int4/int8 multiplication with activations in fp16 [NVIDIA/cutlass#1413](https://github.com/NVIDIA/cutlass/pull/1413) by @alexsamardzic > > @msaroufim [#1413](https://github.com/NVIDIA/cutlass/pull/1413) is `S8 x...

mobicham

quantization with transformers of RWKV/v6-Finch-1B6-HF

quantization with transformers of RWKV/v6-Finch-1B6-HF

[rfc][dont merge] Use the skip_guard_eval stance to remove torch.compile guard overhead

[rfc][dont merge] Use the skip_guard_eval stance to remove torch.compile guard overhead

lm_eval | RuntimeError: expected mat1 and mat2 to have the same dtype,

[RFC] Which low bit CUDA kernels should we merge or write?

[RFC] Which low bit CUDA kernels should we merge or write?

[RFC] Which low bit CUDA kernels should we merge or write?

gemlite integration in torchao

integrated into gpt-fast