cccclai comments

Results 217 comments of


                                            cccclai

Half support for cat, mean, pow, sigmoid, softmax, to_copy

Closing stale PR

prefill model

I update the PR to use linear to conv pass now as the segfault can reproduced now. Here is the latest log [prefill_qnn.log](https://github.com/user-attachments/files/17235899/prefill_qnn.log) I can see matmul fails to lower...

prefill model

> I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not...

prefill model

I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now...

prefill model

Thanks folks! I was able to get the model running with embedding/matmul lower with these changes. Maybe we can extend the soc table? The change looks reasonable to me.

prefill model

**layer norm op lowering:** We have a different model using layernorm instead rmsnorm, because the runtime just recently bumps to 2.25 and the current model still uses layernorm, I'll make...

prefill model

In the meanwhile, we're tracking latency (both model loading time and inference time), memory, power and accuracy for production. Latency and accuracy are easier, how about memory and power?

prefill model

> Hi @cccclai I [add a PR ](https://github.com/cccclai/executorch-1/pull/2)to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. If you have any problem, please let...

prefill model

> Oh~ sure, let me add more descriptions for [this PR](https://github.com/cccclai/executorch-1/pull/2) About 16x8 matmul op, I think it can be divided into two types according to whether to use kv...

FastGelu custom htp kernel

Hi team, I add a FastGelu example, but I didn't use HTP intrinsics so the perf is still not optimized. Would like to know where to put these examples