cccclai comments

Results 217 comments of


                                            cccclai

[Draft] Qualcomm AI Engine Direct - Support kv_cached llama2 model

I was able to repro the fp version on my side, but for the 8a8w version, I hit model loading error ``` [ERROR] [Qnn ExecuTorch]: Skel failed to process context...

[Draft] Qualcomm AI Engine Direct - Support kv_cached llama2 model

turns out I forget the -ptq flag...I can repro both fp and 8a8w now. what does the performance look like from your side? From the log output, seems like 1-2...

[Draft] Qualcomm AI Engine Direct - Support kv_cached llama2 model

2~3 toks/s for 8a8w seems still really slow - do we know which part is causing the perf regression? Is delegated part runs reasonably fast and the cpu part is...

[Draft] Qualcomm AI Engine Direct - Support kv_cached llama2 model

> Dear @shewu-quic @cccclai, > > does [PR 3196](https://github.com/pytorch/executorch/pull/3196) resolve the issue #2590? If so, I will close the issue. Thank you in advance! Thanks for the update and sending...

exir "missing out vars"

Are you trying to lower the model to CoreML by passing `--coreml`? We're still actively working on enabling llama2 7b with CoreML. The xnnpack backend is ready for llama2 7b...

exir "missing out vars"

xnnpack (https://github.com/google/XNNPACK) is a software library with a list highly optimized operators in CPU. It can work on iOS too. Regarding CoreML questions, I'd defer to @cymbalrush and @YifanShenSZ to...

checkpoint str has no attribute 'get'

@Jack-Khuu is the on-device evaluation ready? edit: Acutally coreml should be able to run on Mac too, @antmikinka are you looking for on device evaluation, or just evaluate the coreml...

checkpoint str has no attribute 'get'

I think it's related to how we expect eval to work with delegated model, in this case coreml

[MPS - DRAFT] Add support for slice_scatter; enable index_put

I check out this pr and run ``` git submodule sync git submodule update --init ./backends/apple/mps/install_requirements.sh python -m examples.models.llama2.export_llama -kv --mps ``` but still can't repro...

[MPS - DRAFT] Add support for slice_scatter; enable index_put

Have it working with this patch ``` diff --git a/backends/apple/mps/partition/mps_partitioner.py b/backends/apple/mps/partition/mps_partitioner.py index e5497389d..8e22169c0 100644 --- a/backends/apple/mps/partition/mps_partitioner.py +++ b/backends/apple/mps/partition/mps_partitioner.py @@ -43,12 +43,6 @@ class MPSOperatorSupport(OperatorSupportBase): self.edge_program = edge_program def is_node_supported(self, submodules,...