chenbohua3
chenbohua3
@SinDongHwan Thanks a lot~
I think so, did you figure it out?
Some updates: I modified the `neon_gemm_qasymm8.cpp` example to reproduce the error, the modified codes is attached. Just run the command: ``` LD_LIBRARY_PATH=build ./build/examples/neon_gemm_qasymm8 1 1 1 ``` we will get:...
@morgolock Please help to look at this question, thanks you:)
Some updates: Under the same `src1`/`src2`/`dst` settings, If I create/configure/run `NEGEMMLowpMatrixMultiplyCore` and `NEGEMMLowpOutputStage` separately, than the result is correct. Or, the result is wrong. Here are the codes in which...
Same question. Most results in the paper come from the models which have about 320M FLOPs, this makes me think that the bin setting is used to make the models...
> 我觉得可以采用方法2,然后我们来完成calibration的过程? 那我先适配一下quantize接口然后把代码传上来
我已经把初版代码pr到hf的repo里了,见[这里](https://huggingface.co/THUDM/chatglm-6b/discussions/43)。目前测下来还有个显存残留的问题,量化好的模型显存占用还是偏大,save&reload之后显存就正常了,我还在查找原因。
@duzx16 显存的问题目前已经解决了~
LGTM! Thanks a lot!