mobicham

Results 113 comments of mobicham

You can test with this gist: https://gist.github.com/mobicham/701dd564c52590203ee09631425ad797

@ArthurZucker just a friendly reminder to review this PR when you have a moment. Let me know if you need any clarifications or if there’s anything I can help with....

@rohit-gupta thanks for flagging !

@blap is this related to the latest transformer changes? Otherwise, which hqq version causes this?

> > @blap is this related to the latest transformer changes? Otherwise, which hqq version causes this? > > I think so. I didn't had this problem in the release...

Any one from the HF team can track down this problem please? What changed ? Nothing on the hqq lib side changed much.

@blap why don't you use the latest release ? It works fine last time I tried (last week)

@blap `4.47.0` works for sure

Any timeline for this ? We would love to push a quantized version!

@naiveen what are you trying to optimize exactly? In practice, you need torch.compile / cuda graphs end-2-end in your model to optimize inference, because there's overhead to launch the Triton...