star

Results 12 comments of star

你好,您可以提供一下运行代码的硬件信息,还有torch、transformers的版本信息吗

这是在2080ti上的运行结果,你可以参考一下 transformers==4.19.0 torch==1.12.0 ![image](https://user-images.githubusercontent.com/30221696/181414465-92178905-dc47-49ae-b42e-346b2c3296aa.png) 此外,将测试代码的using_half参数设置成True使用eet fp16推理可以获得更好的加速效果

> NVIDIA A100-SXM torch: 1.10.1+cu111 transformers: 4.20.1 cuda:11.1 cudatoolkit:11.3.1 cudnn:8.0.4 Driver Version: 515.48.07 非常感谢您及时回复,以上就是我用的环境信息,您可以提供一下您刚刚这个结果所使用的详细的硬件信息吗 NVIDIA GeForce RTX 2080 Ti cuda: 11.6 cudnn: 8.3.3 Driver version: 470.82.01

同问,chatglm-6B版本模型的qkv多头顺序和标准glm模型不同,是否有适配版本

Could you please provide the environment you use?From your information,the trust library is not installed correctly which is included in cuda. Also,we recommend using the dockerfile in the EET repository...

@520jefferson Which version is nvcc or cuda?Do torch and transformers work properly?

@520jefferson can you provide your dockerfile, we will try to figure out what goes wrong

Thanks for the feedback, if you have any questions, contact with us.

@byshiue Thanks for your reply. I tried adding parameters`--kv_cache_dtype fp8`, but the performance didn't seem to improve. ``` python quantize.py --model_dir ${WORK_HF_DIR} \ --dtype float16 \ --qformat fp8 \ --output_dir...