mobicham comments

Results 113 comments of


                                            mobicham

Add way to save quantize config and can be loaded again

Closing this since we are very close to full transformers serialization support: https://github.com/huggingface/transformers/pull/33141

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

Hi, I don't know if 12.4 is supported for the nightly torch. Can you try: - CUDA 12.1 - pip install torch==2.5.0.dev20240905+cu121 --index-url https://download.pytorch.org/whl/nightly/cu121; - pip install hqq (install hqq...

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

@larin92 did you set your environment to use cuda 12.1 ? Make sure you are using the right version: ```Python export CUDA_HOME=/usr/local/cuda-12.1 # or the path where you have cuda-12.1...

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

Thanks @larin92 ! Willl close the issue, unless you face issues with Ampere gpus and above

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

Hey, sorry for the delay, I am traveling this week, will try to debug it when I get back home: If you are using a language (only) model: * If...

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

I tried with Qwen, it's working fine like this, had to change a bit the chat template since Qwen has that system prompt: ```Python #pip install torch==2.4.1 hqq; #2.4.1+cu124 #OMP_NUM_THREADS=16...

[triton 3.2] std::bad_alloc: torch.compile breaks with Triton built from source

I tried the latest nightly build, it breaks with another error: ``` File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 1763, in __init__ self._init(nodes) File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 1815, in _init self.dead_node_elimination() File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2169,...