mobicham comments

Results 113 comments of


                                            mobicham

Warning: failed to import the BitBlas backend

You can select the GPUs you want to use via `CUDA_VISIBLE_DEVICES=0 ipython3` What model and GPUs are you trying to use ? If you want to use multi-gpu runtime, I...

Warning: failed to import the BitBlas backend

Can you please share a code snippet of what model you are trying to use and your system settings (what gpus does your machine have?)

Warning: failed to import the BitBlas backend

Strange, try this: ```Python import torch from transformers import AutoTokenizer from hqq.models.hf.base import AutoHQQHFModel from hqq.utils.patching import * from hqq.core.quantize import * from hqq.utils.generation_hf import HFGenerator #Load the model ###################################################...

About the implentation of .cpu()

Thanks! It should be similar to `.cuda()` but instead would use `.to('cpu')`: https://github.com/mobiusml/hqq/blob/b1a7c0698b2c323bfa55a2b4a110c8f3636fade7/hqq/core/quantize.py#L472-L535 RIght now it is a mess because we support quantizing the scale/zero values and support offloading them...

About the implentation of .cpu()

This is an old issue, already resolved.

Add way to save quantize config and can be loaded again

Thanks a lot for the effort @fahadh4ilyas ! That is correct, as a temporary solution, there's this patching functions that adds a quant_config: https://github.com/mobiusml/hqq/blob/master/hqq/utils/patching.py#L29 There's an easy way to do...

Add way to save quantize config and can be loaded again

Yeah I thought about it, but it will make things even more complicated, since it will require more work on the `transformers` lib side. Putting everything in `state_dict` simplifies the...

Add way to save quantize config and can be loaded again

hqq's `save_quantized` wouldn't require changes in transformers that's correct, but the goal is to have official serialization support with HF transformers directly, so we would be able to save models...

Add way to save quantize config and can be loaded again

I also tried loading a model saved with the previous version (https://huggingface.co/mobiuslabsgmbh/Llama-2-7b-chat-hf_4bitnogs_hqq) and it worked without any issue, which is good news for backward compatibility. Now we just need to...

Add way to save quantize config and can be loaded again

Draft pull request here: https://github.com/huggingface/transformers/pull/32056