PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

support chatglmv2 infer with block_attn fp16 & wint8

Open xue-yun-liang opened this issue 1 year ago • 3 comments
trafficstars

PR types

Feature

PR changes

Models

Description

这个PR是用来支持chatglmv2和chatglmv3在block_attn组网模式下跑通fp16和weight-only int8量化下的高性能推理 目前可以跑通chatglmv2和chatglmv3所有的block_attn fp16版本和weight-only int8量化版本,复现命令如下 (chatglm3换一下model name即可同样复现) fp16 动态图 python predict/predictor.py --model_name_or_path THUDM/chatglm2-6b --dtype float16 --output_file ./output.json --decode_strategy greedy_search --mode dynamic --inference_model --batch_size 1 --block_attn 1 fp16 动转静 python predict/export_model.py --model_name_or_path THUDM/chatglm2-6b --output_path /root/.cache/paddlenlp/exported_model/THUDM/chatglm2-6b --dtype float16 --inference_model --block_attn 1 --batch_size 1 fp16静态图 python predict/predictor.py --model_name_or_path /root/.cache/paddlenlp/exported_model/THUDM/chatglm2-6b --dtype float16 --output_file ./output.json --mode static --inference_model --batch_size 1 --block_attn 1

weight-only int8动态图 python predict/predictor.py --model_name_or_path THUDM/chatglm2-6b --dtype float16 --output_file ./output.json --decode_strategy greedy_search --mode dynamic --inference_model --batch_size 1 --block_attn 1 --quant_type weight_only_int8 weight-only int8动转静 python predict/export_model.py --model_name_or_path THUDM/chatglm2-6b --output_path /root/.cache/paddlenlp/exported_model/THUDM/chatglm2-6b-wint8 --dtype float16 --inference_model --block_attn 1 --batch_size 1 --quant_type weight_only_int8 weight-only int8静态图 python predict/predictor.py --model_name_or_path/root/.cache/paddlenlp/exported_model/THUDM/chatglm2-6b-wint8 --dtype float16 --output_file ./output.json --mode static --inference_model --batch_size 1 --block_attn 1 --quant_type weight_only_int8

目前存在一个问题,即在block_attn模式下跑chatglm3会有一点点类似精度问题的状况,下个pr解决该情况 输出如下 CC455630784357224B21BACAFBB83451

xue-yun-liang avatar Aug 06 '24 10:08 xue-yun-liang

Thanks for your contribution!

paddle-bot[bot] avatar Aug 06 '24 10:08 paddle-bot[bot]

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Aug 06 '24 10:08 CLAassistant

Codecov Report

Attention: Patch coverage is 0.91743% with 108 lines in your changes missing coverage. Please review.

Project coverage is 53.89%. Comparing base (aaacb32) to head (a3952d5). Report is 667 commits behind head on develop.

Files with missing lines Patch % Lines
...p/experimental/transformers/chatglm_v2/modeling.py 0.00% 88 Missing :warning:
paddlenlp/utils/llm_utils.py 0.00% 19 Missing :warning:
...enlp/experimental/transformers/generation_utils.py 0.00% 1 Missing :warning:

:x: Your patch check has failed because the patch coverage (0.91%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. :x: Your project check has failed because the head coverage (53.89%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8881      +/-   ##
===========================================
- Coverage    54.38%   53.89%   -0.49%     
===========================================
  Files          648      650       +2     
  Lines       103266   104337    +1071     
===========================================
+ Hits         56161    56236      +75     
- Misses       47105    48101     +996     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Aug 06 '24 12:08 codecov[bot]

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 14 '24 00:12 github-actions[bot]

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

github-actions[bot] avatar Feb 14 '25 00:02 github-actions[bot]

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

github-actions[bot] avatar Apr 17 '25 00:04 github-actions[bot]