executorch
executorch copied to clipboard
How to deploy llama2 on Qualcomm Snapdragon chips through ExecuTorch?
Excuse me, if I need to deploy llama2 on Qualcomm Snapdragon chip through ExecuTorch and want to use NPU computing power as an inference computing unit, what do I need to do?
The chip specs I'm currently using are SG885G-WF https://www.quectel.com/product/wi-fi-bt-sg885g-wf-smart-module。
Thanks for the request! We are working on this and will get back to you when there's something we can share.
@iseeyuan Is there a rough schedule or timeline for this? Qualcomm is scheduled to make available Llama 2-based AI implementations on flagship smartphones starting from 2024.
Will executorch "varied input len" deal with QNN constraints?
Do you mind to share any ideas? :blush:
@iseeyuan can you update this issue please?
It's the same as https://github.com/pytorch/executorch/issues/3586. We are WIP on it. From @cccclai :
we only have the enablement for the small stories models right now. We're actively working on enable llama2 and improve performance number.