ehuaa
ehuaa
大佬好,现在想利用PPQ实现一些大模型量化的方法,有以下几个问题: 1. 看之前的代码貌似是有PPLCUDA_INT4_Quantizer,int4量化的,这个移除的原因是什么呢,是因为这个quantizer量化的效果不佳么 2. 假如基于ppq实现了3bit 4bit模型的量化存到onnx里面了,想问下怎么存储权重呢,是不是用make_tensor的时候把权重encode到8bit里面存成bytes,onnx make_tensor的时候放到raw_data里面呢?
### Feature request https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L1006-L1023  In the code listed above, the latest version of transformers cannot use sliding window feature in mistral model. I doubt that the reason is you...
After I git clone this project, i tried to compile from source code. when i ran into bash ./scripts/build_pytorch_blade.sh i got this error while my pytorch version is  torch.__version__...
When i quantize a model, the avg loss is lower in earlier layers(0.02) than the loss in later layers(2.0), i'm curious that if the quantization is failed due to a...
After using AutoAWQ quantizing my finetuned version model of qwen1.5-72b, i make two tests. 1. run ppl after quant for test 1 2. human eval test for test 2 for...
When I test Qwen1.5-72B-chat-AWQ with bash scripts/longbench.sh it turns out to OOM with A100 80G My config: model: type: inf-llm path: /root/czh/quant_models/Qwen2-geogpt-72b-0412-awq-dde-12000 block_size: 128 n_init: 128 n_local: 4096 topk: 16...
when i ran setup_venv.sh,i encounter the error as follows Successfully Installed torch-mlir rm: cannot remove '.use-iree': No such file or directory Installing https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html... Looking in links: https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html Requirement already satisfied:...
在4.2节中,即Table 1的InfiniBench的测试结果中,对于Mistral 7B window是16K,然后Llama3-8B的window是8K 但是在Appendix 里的Table 5中,对于LongBench它的window size对于Mistral变为了 12K,6K 这里有下面两个问题想请教一下: 1.那对于不同的任务是不是还要离线的先手动选择window size呢 2.对llama3 来说,paper中说的windowsize 是8k,但是repo中的配置我看是16*128+4k=6k,想问下是最后经过测试发现llama3 6k windowsize也可以么
### System Info Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Python version: 3.10.12 PyTorch version (GPU?): 2.4.0+cu121 (True) [TensorRT-LLM] TensorRT-LLM version: 0.12.0 Driver Version: 535.161.08 CUDA Version: 12.5 GPU: A40 single card ### Who can...
### Your current environment ```text The output of `python collect_env.py` ``` PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A...