cccclai
cccclai
The difference between Lite Interpreter (Pytorch Mobile) and ExecuTorch is that, in Executorch, we plan memory ahead of time, which can help us to re-use/reduce memory usage in runtime. What...
> Why is delegation (e.g. via XNNPACK) necessary in order to lower a model onto an edge device using executorch? Delegation is for delegate part of or the whole model...
The perf number shared in https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md#performance is purely CPU, using XNNPACK backend. For lowering to QCOM HTP, it's still ongoing and we only have the enablement for the small stories...
@CHNtentes Are you looking for running llama2 model via HTP or GPU?
Thanks! Overall looks good and thanks for the update. One remaining issue is the allocator - I'd like to see if we can do it in a more systematic way...
The changes in root cmake and the buffer allocator looks good! I marked them as resolved. The only one left is the destroy function and we're good then.
Many thanks for MTK team for making the contribution. I'll go ahead to merge it. Need to wait for the CI signal a bit.
Looks like there are some lint errors. Could you address them? Here is the failing job https://github.com/pytorch/executorch/actions/runs/10296603326/job/28529110807?pr=3571
Fixed in https://github.com/pytorch/executorch/pull/4473
This tutorial might be helpful https://pytorch.org/executorch/main/sdk-profiling.html cc: @tarun292