mobicham

Results 113 comments of mobicham

Oh, in the review, you don't see this https://github.com/mobiusml/hqq/pull/116/files/631ea011d8432b8a76518b0adc072574969d8771 ?

I just tried this one and it compiles without graph breaks: ```Python @torch.inference_mode() def optimize_weights_proximal_legacy( tensor: Tensor, scale: Tensor, zero: Tensor, min_max: list, axis: int = 0, device: Union[str, None]...

Works with `Quantizer.quantize` compiled as well! I suggest we do the following: -We remove the `@torch.compile` decorator and do the compilation outside, either like this ```Python Quantizer.optimize_weights = torch.compile(Quantizer.optimize_weights) ```...

Closing this PR since there was no update for over a year, but feel free to re-open another one please!

@void-main Trying your code above but there are a couple of issues: * `cached_bin` doesn't have `c_wrapper()` call * bin doesn't have attributes like `num_ctas`, `clusterDims`, etc. Do you have...

So I have been looking into this issue, I can confirm that @xinji1's solution does work to some extent. However calling`run()` directly is not compatible with torch.compile. Instead, I found...

``` pip install --upgrade git+https://github.com/huggingface/transformers.git pip install --upgrade git+https://github.com/mobiusml/hqq.git ```

Hi! What of quantization is GGUF using? If it's asymmetric quantization (with both scales/zeros) it could be converted

Thanks for sharing, looks like the logic is quite different, so I don't think both quantized outputs are compatible unfortunately.

It seems this is more of a transformers issue: it's not an official transformers model (`trust_remote=True`), so it's difficult to make sure everything would work fine. The model is actually...