davidray222 issues

Results 8 issues of


                                            davidray222

Model format question about safetensor or bin

Thank you for providing such an excellent solution. I have a small question: does auto-gptq only support reading models in .safetensors format? If I want to use a .bin format...

merging question :)

Thank you for providing this great work! I want to ask in the merge.py "import torch model_path = 'path of the quantized model' lora_path = 'path of the saved LoRA...

'triton.compiler' has no attribute 'OutOfResources'

Thank you for this excellent work!! may I ask some question? I use llama-7b to quantize and use qalora.py,and occur problem: except triton.compiler.OutOfResources: AttributeError: module 'triton.compiler' has no attribute 'OutOfResources'...

How to use the quantized model?

Thank you for providing such excellent methods. I would like to ask how you load and use the quantized models. Thank you!

the question about LLM inference performance

Thank you for providing such outstanding research! I tested the llama7b model, and after pruning, both the memory usage and inference speed are not significantly different from the original model....

The accuracy problem

Hello, thank you for providing such a useful method, but I encountered some problems while pruning llama-7b. Environment: python 3.10 torch 2.6.0 transformers 4.49.0 accelerate 1.5.2 `my command:python prune_llm.py --model...

AttributeError: 'tuple' object has no attribute 'grad_fn'

[BUG] RuntimeError: Numpy is not available

**Describe the bug** ``` INFO Packing model... INFO Packing Kernel: Auto-selection: adding candidate `TorchQuantLinear` INFO Kernel: candidates -> `[TorchQuantLinear]` INFO Kernel: selected -> `TorchQuantLinear`. Packing model.layers.0.mlp.gate_proj [5 of 224] █---------------------------------------------------------------|...

bug