cccclai
cccclai
Can you also update these ``` --- a/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp +++ b/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp @@ -73,7 +73,13 @@ cache->GetQnnContextBlob(); // memfd_create on android api level 30 and above - int fd = memfd_create("tmp.dlc", 0);...
hmm seems like there is a merge conflict, can you rebase?
I'm out of office and don't have access for now. @kirklandsign can you help a bit?
> > > @shewu-quic great job! does it support llama2 7b? > > > > > > Unfortunately, it does not support llama2 7b in this draft, but we are...
> If I add the following, it will get reasonable English sentences in quantized model. Ah yes we will use a more generic to calibrate. I merged this pr (https://github.com/pytorch/executorch/pull/3756)...
> shard Sorry for the delay, was distracted by the performance review last week...I use the ExecutorBackend, and tag every 8 layers, will publish soon. I think having a noop...
This is my current change, still trying to debug an op but it's getting close.. [model_sharding.patch](https://github.com/user-attachments/files/16072311/model_sharding.patch) This is pretty much the idea  I think it still worth exploring the...
This is great. I think if we have a custom graph break op, it doesn't have to qnn specific and can be applicable to other flow or backends. > But...
> The last node of the layer is add node. However, you could find #L466 and #L470 which are the same source_fn and module stack. So maybe I also need...
> > Yeah that's my understanding too. However for 4 shards, we need to init(shard_1) -> destroy (shard_1) -> init(shard_2)-> destroy (shard_2) -> ..., if we do init(shard_1) -> init(shard_2)...