chiwwang

Results 24 comments of chiwwang

We will try to reproduce this at our side.

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

Sad, the segmentation fault of linear was detected around 2.26~2.27 timeframe. The fix is not released yet. ETA is QNN 2.28, which is at the end of Oct.

I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not on...

> I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now... I'm also...

> > We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so? > > Hi @cccclai, @chiwwang, >...

Hi @cccclai I added the SOC here: https://github.com/cccclai/executorch-1/pull/1 I ran a silly model with soc_model=SSG2115P on a SM8550 and it seems OK. I will test the command shared here. [update]...

> Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. If you have any problem, please let...

So it's "custom annotation", almost based on the topology of the graph, right? We look into the graph and choose a node to annotate, which helps us to obtain 16x8...

Got it Thanks. Note that the command should contain **--soc_model SSG2115P** for correct VTCM size. (need PR https://github.com/cccclai/executorch-1/pull/1, though) python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w **--soc_model SSG2115P**