About the implentation of .cpu()

Open reflectionie opened this issue 1 year ago • 1 comments

Thanks for your work, may I ask when you expect to implement the .cpu() method of HQQLinear? Or can you please briefly describe how to implement it, I can implement it myself and submit a PR: https://github.com/mobiusml/hqq/blob/b1a7c0698b2c323bfa55a2b4a110c8f3636fade7/hqq/core/quantize.py#L563

Jul 19 '24 04:07 reflectionie

Thanks! It should be similar to .cuda() but instead would use .to('cpu'): https://github.com/mobiusml/hqq/blob/b1a7c0698b2c323bfa55a2b4a110c8f3636fade7/hqq/core/quantize.py#L472-L535 RIght now it is a mess because we support quantizing the scale/zero values and support offloading them to the cpu. I think in the future we are gonna remove this which should make things much easier: https://github.com/mobiusml/hqq/pull/93#issuecomment-2230605730

May I ask why would need the .cpu() call? If you just want to use HQQLinear with cpu, you can just pass HQQLinear(...device='cpu')

Jul 19 '24 07:07 mobicham

This is an old issue, already resolved.

Apr 09 '25 10:04 mobicham