davidray222

Results 8 issues of davidray222

Thank you for providing such an excellent solution. I have a small question: does auto-gptq only support reading models in .safetensors format? If I want to use a .bin format...

Thank you for providing this great work! I want to ask in the merge.py "import torch model_path = 'path of the quantized model' lora_path = 'path of the saved LoRA...

Thank you for this excellent work!! may I ask some question? I use llama-7b to quantize and use qalora.py,and occur problem: except triton.compiler.OutOfResources: AttributeError: module 'triton.compiler' has no attribute 'OutOfResources'...

Thank you for providing such excellent methods. I would like to ask how you load and use the quantized models. Thank you!

Thank you for providing such outstanding research! I tested the llama7b model, and after pruning, both the memory usage and inference speed are not significantly different from the original model....

Hello, thank you for providing such a useful method, but I encountered some problems while pruning llama-7b. Environment: python 3.10 torch 2.6.0 transformers 4.49.0 accelerate 1.5.2 `my command:python prune_llm.py --model...

Hello, thank you for providing such a useful method, but I encountered some problems while pruning llama-7b. Environment: python 3.10 torch 2.6.0 transformers 4.49.0 accelerate 1.5.2 `command:python prune_llm.py --model huggyllama/llama-7b...

**Describe the bug** ``` INFO Packing model... INFO Packing Kernel: Auto-selection: adding candidate `TorchQuantLinear` INFO Kernel: candidates -> `[TorchQuantLinear]` INFO Kernel: selected -> `TorchQuantLinear`. Packing model.layers.0.mlp.gate_proj [5 of 224] █---------------------------------------------------------------|...

bug