mobicham comments

Results 113 comments of


                                            mobicham

Hqq serialization

You can test with this gist: https://gist.github.com/mobicham/701dd564c52590203ee09631425ad797

Hqq serialization

@ArthurZucker just a friendly reminder to review this PR when you have a moment. Let me know if you need any clarifications or if there’s anything I can help with....

Hqq serialization

@rohit-gupta thanks for flagging !

Hqq serialization

@blap is this related to the latest transformer changes? Otherwise, which hqq version causes this?

Hqq serialization

> > @blap is this related to the latest transformer changes? Otherwise, which hqq version causes this? > > I think so. I didn't had this problem in the release...

Hqq serialization

Any one from the HF team can track down this problem please? What changed ? Nothing on the hqq lib side changed much.

Hqq serialization

@blap why don't you use the latest release ? It works fine last time I tried (last week)

Hqq serialization

@blap `4.47.0` works for sure

Add Aria

Any timeline for this ? We would love to push a quantized version!

What kind of layers are optimized by torchao on a RTX 4090?

@naiveen what are you trying to optimize exactly? In practice, you need torch.compile / cuda graphs end-2-end in your model to optimize inference, because there's overhead to launch the Triton...