ehuaa issues

Results 10 issues of


                                            ehuaa

想利用PPQ做大模型量化的一些疑问

大佬好，现在想利用PPQ实现一些大模型量化的方法，有以下几个问题： 1. 看之前的代码貌似是有PPLCUDA_INT4_Quantizer，int4量化的，这个移除的原因是什么呢，是因为这个quantizer量化的效果不佳么 2. 假如基于ppq实现了3bit 4bit模型的量化存到onnx里面了，想问下怎么存储权重呢，是不是用make_tensor的时候把权重encode到8bit里面存成bytes，onnx make_tensor的时候放到raw_data里面呢？

Add sliding window attention to sdpa in mistral

### Feature request https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L1006-L1023 ![image](https://github.com/huggingface/transformers/assets/5137359/9601a5d2-cf9f-4ef6-a0ab-047a8cd7f1cd) In the code listed above, the latest version of transformers cannot use sliding window feature in mistral model. I doubt that the reason is you...

Good Second Issue

version unmatched when i build from source code

After I git clone this project, i tried to compile from source code. when i ran into bash ./scripts/build_pytorch_blade.sh i got this error while my pytorch version is ![1679227207233](https://user-images.githubusercontent.com/5137359/226173703-73002518-9b0b-44e9-b85c-427e77fa9275.png) torch.__version__...

What magnitude of avg loss indicates a relatively good result for a quantization model

When i quantize a model, the avg loss is lower in earlier layers(0.02) than the loss in later layers(2.0), i'm curious that if the quantization is failed due to a...

After quantization，the ppl is ok but humaneval score drops sharply

After using AutoAWQ quantizing my finetuned version model of qwen1.5-72b, i make two tests. 1. run ppl after quant for test 1 2. human eval test for test 2 for...

Qwen1.5-72B-chat-AWQ with longbench and infinibench benchmark OOM with A100 80G

When I test Qwen1.5-72B-chat-AWQ with bash scripts/longbench.sh it turns out to OOM with A100 80G My config: model: type: inf-llm path: /root/czh/quant_models/Qwen2-geogpt-72b-0412-awq-dde-12000 block_size: 128 n_init: 128 n_local: 4096 topk: 16...

error: Cannot update time stamp of directory 'nodai_SHARK.egg-info'

when i ran setup_venv.sh,i encounter the error as follows Successfully Installed torch-mlir rm: cannot remove '.use-iree': No such file or directory Installing https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html... Looking in links: https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html Requirement already satisfied:...

关于llama3和mistral config的设置

在4.2节中，即Table 1的InfiniBench的测试结果中，对于Mistral 7B window是16K，然后Llama3-8B的window是8K 但是在Appendix 里的Table 5中，对于LongBench它的window size对于Mistral变为了 12K，6K 这里有下面两个问题想请教一下： 1.那对于不同的任务是不是还要离线的先手动选择window size呢 2.对llama3 来说，paper中说的windowsize 是8k，但是repo中的配置我看是16*128+4k=6k，想问下是最后经过测试发现llama3 6k windowsize也可以么

[Nougat]Accuracy Problem: different output for both float32 and bfloat 16 trtllm engine with float32 huggingface original model

### System Info Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Python version: 3.10.12 PyTorch version (GPU?): 2.4.0+cu121 (True) [TensorRT-LLM] TensorRT-LLM version: 0.12.0 Driver Version: 535.161.08 CUDA Version: 12.5 GPU: A40 single card ### Who can...

bug

stale

[Usage]: When to use flashinfer as the default backend

### Your current environment ```text The output of `python collect_env.py` ``` PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A...

usage