cccclai comments

Results 217 comments of


                                            cccclai

Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation

Can you also update these ``` --- a/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp +++ b/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp @@ -73,7 +73,13 @@ cache->GetQnnContextBlob(); // memfd_create on android api level 30 and above - int fd = memfd_create("tmp.dlc", 0);...

Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation

hmm seems like there is a merge conflict, can you rebase?

Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation

I'm out of office and don't have access for now. @kirklandsign can you help a bit?

[Draft] Qualcomm AI Engine Direct -Enable story llama model in quantied and fp

> > > @shewu-quic great job! does it support llama2 7b? > > > > > > Unfortunately, it does not support llama2 7b in this draft, but we are...

[Draft] Qualcomm AI Engine Direct -Enable story llama model in quantied and fp

> If I add the following, it will get reasonable English sentences in quantized model. Ah yes we will use a more generic to calibrate. I merged this pr (https://github.com/pytorch/executorch/pull/3756)...

[Draft] Qualcomm AI Engine Direct -Enable story llama model in quantied and fp

> shard Sorry for the delay, was distracted by the performance review last week...I use the ExecutorBackend, and tag every 8 layers, will publish soon. I think having a noop...

[Draft] Qualcomm AI Engine Direct -Enable story llama model in quantied and fp

This is my current change, still trying to debug an op but it's getting close.. [model_sharding.patch](https://github.com/user-attachments/files/16072311/model_sharding.patch) This is pretty much the idea ![image](https://github.com/pytorch/executorch/assets/16430979/c3034d38-9d02-4b7c-84db-faef27343649) I think it still worth exploring the...