Alexander

Results 23 comments of Alexander

> 1. We might need to look at FreezeModule preserved attributes Yes, I used to make a try by using FreezeModule preserved attributes, but failed.

@wanyne-yyds which model do you use? v6s_reopt or v6s?

@wanyne-yyds ops' name in op_concat_fusion_list are from v6s_reopt, if you use v6s to apply partial ptq, you need to modify ops'name in op_concat_fusion_list.

@dejavvuu 1. you can remove quant/de-quant ops from onnx graph before deployment. 2. ~0.1 mAP drop is normal.

校准 HF_MODEL=./llama-2-7b WORK_DIR=../llama-2-7b-awq python3 -m lmdeploy.lite.apis.calibrate \ --model $HF_MODEL \ --calib_dataset 'c4' \ --calib_samples 128 \ --calib_seqlen 2048 \ --work_dir $WORK_DIR 量化 HF_MODEL=./llama-2-7b WORK_DIR=../llama-2-7b-awq-64 python3 -m lmdeploy.lite.apis.auto_awq \ --model $HF_MODEL...

@irexyc 不是,我用官方模型自己走了一遍转换部署流程。

@CaffreyR get_act_scales to smooth act and weight get_static_decoder_layer_scales to quantize act and weight

http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/ , please download from this website

@yang0817manman make sure the width and height of image are 128x128.

vgg16+s3fd hard to be realtime on mobile device, if you want a realtime model, you can you mv2+s3fd. https://github.com/lippman1125/S3FD.PyTorch