Luchang Li
Luchang Li
The 0.4.19 add a optimizations to merge consecutive slice into one slice, I want to disable this pass but don't know what pass to disable. Generally, how can we now...
**Describe the bug** A clear and concise description of what the bug is. Some times onnx model graph.input contains info about graph.initializer for unknown reasons. At this condition, the onnxsim...
clCreateCommandQueueWithProperties does not support properties like CL_QUEUE_PRIORITY_KHR CL_QUEUE_THROTTLE_KHR, currently only support CL_QUEUE_PROPERTIES
采用下面的方式替代已有计算可以明显降低next_token计算量,用于替换原有的 ``` next_token_scores = self.apply_warp(next_token_scores) probs = npsoftmax(next_token_scores.astype(np.float64), axis=1) # Caution: # *** ValueError: sum(pvals[:-1].astype(np.float64)) > 1.0. The pvals array is cast to 64-bit floating point prior to checking the...
I can't download the file "https://the-eye.eu/public/AI/pile/val.jsonl.zst" in get_calib_dataset, can we use other data to replace it? Thanks a lot.
when using transformers verison 4.35.2, I got this error, and similar error for quanting llama: it seems you are using version
OmniQuant-main/models/int_falcon_layer.py", line 52, in __init__ self.maybe_rotary = copy.deepcopy(org_module.maybe_rotary) File "local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'FalconAttention' object has no attribute 'maybe_rotary' transformers version:...
我评测了Qwen1.5-7B-Chat和两个量化模型的MMLU效果,发现AWQ的分数特别低,比直接naive 4bit还差。这是什么情况呢? 浮点模型分数0.60,而GPTQ版本0.59而AWQ版本只有0.45,naive的版本都有0.589 GPTQ和AWQ量化模型: https://huggingface.co/Qwen/Qwen1.5-7B-Chat-AWQ https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
**Describe the bug** A clear and concise description of what the bug is. I use onnxsim-0.4.36 to simpily a model and get this error: Traceback (most recent call last): File...