executorch How to deploy llama2 on Qualcomm Snapdragon chips through ExecuTorch？

How to deploy llama2 on Qualcomm Snapdragon chips through ExecuTorch？

Open tensorflowt opened this issue 1 year ago • 5 comments

Excuse me, if I need to deploy llama2 on Qualcomm Snapdragon chip through ExecuTorch and want to use NPU computing power as an inference computing unit, what do I need to do?

The chip specs I'm currently using are SG885G-WF https://www.quectel.com/product/wi-fi-bt-sg885g-wf-smart-module。

Nov 07 '23 12:11 tensorflowt

Thanks for the request! We are working on this and will get back to you when there's something we can share.

Nov 07 '23 16:11 iseeyuan

@iseeyuan Is there a rough schedule or timeline for this? Qualcomm is scheduled to make available Llama 2-based AI implementations on flagship smartphones starting from 2024.

Nov 20 '23 06:11 jingcheng88

Will executorch "varied input len" deal with QNN constraints?

Do you mind to share any ideas? :blush:

Dec 19 '23 18:12 escorciav

@iseeyuan can you update this issue please?

May 13 '24 15:05 mergennachin

It's the same as https://github.com/pytorch/executorch/issues/3586. We are WIP on it. From @cccclai :

we only have the enablement for the small stories models right now. We're actively working on enable llama2 and improve performance number.

May 14 '24 14:05 iseeyuan

executorch executorch copied to clipboard

How to deploy llama2 on Qualcomm Snapdragon chips through ExecuTorch？

executorch
executorch copied to clipboard