cccclai

Results 217 comments of cccclai

I update the PR to use linear to conv pass now as the segfault can reproduced now. Here is the latest log [prefill_qnn.log](https://github.com/user-attachments/files/17235899/prefill_qnn.log) I can see matmul fails to lower...

> I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not...

I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now...

Thanks folks! I was able to get the model running with embedding/matmul lower with these changes. Maybe we can extend the soc table? The change looks reasonable to me.

**layer norm op lowering:** We have a different model using layernorm instead rmsnorm, because the runtime just recently bumps to 2.25 and the current model still uses layernorm, I'll make...

In the meanwhile, we're tracking latency (both model loading time and inference time), memory, power and accuracy for production. Latency and accuracy are easier, how about memory and power?

> Hi @cccclai I [add a PR ](https://github.com/cccclai/executorch-1/pull/2)to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. If you have any problem, please let...

> Oh~ sure, let me add more descriptions for [this PR](https://github.com/cccclai/executorch-1/pull/2) About 16x8 matmul op, I think it can be divided into two types according to whether to use kv...

Hi team, I add a FastGelu example, but I didn't use HTP intrinsics so the perf is still not optimized. Would like to know where to put these examples