Jelly Lee
Jelly Lee
文中提到可以使用GPTQ进一步进行模型量化,请问下目前的量化方法是什么呢
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 效果: ``` 用户: 类型#裙*版型#显瘦*风格#文艺*风格#简约*图案#印花*图案#撞色*裙下摆#压褶*裙长#连衣裙*裙领型#圆领 ChatGLM-6B: WHICH衣的衣的衣,衣,衣,衣的衣,衣,衣,衣,衣,衣的衣,衣的衣,衣的衣,衣的衣,衣,衣的 "&衣,衣,衣的 "\"的 "\"的衣,衣, "\"的衣 ```...
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 请问下P-Tuning v2支持DeepSpeed进行数据并行吗,发现基于P-Tuning v2使用deepspeed运行结果与单独使用P-Tuning v2或单独使用deepspeed数据并行的loss相差很大 运行结果: #### Full fine-tuning(dp:4) ``` train metrics...
BELLE-7B(bloom)量化后,推理速度显著降低。 BELLE-7B(LLaMA)量化后,推理速度也下降了一部分。 代码: ``` import time import torch import torch.nn as nn from gptq import * from modelutils import * from quant import * from transformers import AutoTokenizer from random...
when I run this [demo](https://github.com/alpa-projects/alpa/blob/hao-opt/examples/opt_finetune/run_clm_flax.py) , an error occurred ``` INFO:__main__:***** Running training ***** INFO:__main__: Num examples = 117750 INFO:__main__: Num Epochs = 8 INFO:__main__: Batch size per device (w....
hi, I found trt-llm kv cache quant lead to model accuracy loss serious, but vllm and lmdeploy only less loss. - model: qwen1.5-7b - evalset: cmmlu 