chiwwang comments

Results 24 comments of


                                            chiwwang

prefill model

We will try to reproduce this at our side.

prefill model

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

prefill model

Sad, the segmentation fault of linear was detected around 2.26~2.27 timeframe. The fix is not released yet. ETA is QNN 2.28, which is at the end of Oct.

prefill model

I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not on...

prefill model

> I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now... I'm also...

prefill model

> > We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so? > > Hi @cccclai, @chiwwang, >...

Hi @cccclai I added the SOC here: https://github.com/cccclai/executorch-1/pull/1 I ran a silly model with soc_model=SSG2115P on a SM8550 and it seems OK. I will test the command shared here. [update]...

prefill model

> Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. If you have any problem, please let...

prefill model

So it's "custom annotation", almost based on the topology of the graph, right? We look into the graph and choose a node to annotate, which helps us to obtain 16x8...

prefill model

Got it Thanks. Note that the command should contain **--soc_model SSG2115P** for correct VTCM size. (need PR https://github.com/cccclai/executorch-1/pull/1, though) python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w **--soc_model SSG2115P**