cccclai

Results 217 comments of cccclai

Can you also update these ``` --- a/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp +++ b/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp @@ -73,7 +73,13 @@ cache->GetQnnContextBlob(); // memfd_create on android api level 30 and above - int fd = memfd_create("tmp.dlc", 0);...

hmm seems like there is a merge conflict, can you rebase?

I'm out of office and don't have access for now. @kirklandsign can you help a bit?

> > > @shewu-quic great job! does it support llama2 7b? > > > > > > Unfortunately, it does not support llama2 7b in this draft, but we are...

> If I add the following, it will get reasonable English sentences in quantized model. Ah yes we will use a more generic to calibrate. I merged this pr (https://github.com/pytorch/executorch/pull/3756)...

> shard Sorry for the delay, was distracted by the performance review last week...I use the ExecutorBackend, and tag every 8 layers, will publish soon. I think having a noop...

This is my current change, still trying to debug an op but it's getting close.. [model_sharding.patch](https://github.com/user-attachments/files/16072311/model_sharding.patch) This is pretty much the idea ![image](https://github.com/pytorch/executorch/assets/16430979/c3034d38-9d02-4b7c-84db-faef27343649) I think it still worth exploring the...

This is great. I think if we have a custom graph break op, it doesn't have to qnn specific and can be applicable to other flow or backends. > But...

> The last node of the layer is add node. However, you could find #L466 and #L470 which are the same source_fn and module stack. So maybe I also need...

> > Yeah that's my understanding too. However for 4 shards, we need to init(shard_1) -> destroy (shard_1) -> init(shard_2)-> destroy (shard_2) -> ..., if we do init(shard_1) -> init(shard_2)...