cccclai
cccclai
I was able to repro the fp version on my side, but for the 8a8w version, I hit model loading error ``` [ERROR] [Qnn ExecuTorch]: Skel failed to process context...
turns out I forget the -ptq flag...I can repro both fp and 8a8w now. what does the performance look like from your side? From the log output, seems like 1-2...
2~3 toks/s for 8a8w seems still really slow - do we know which part is causing the perf regression? Is delegated part runs reasonably fast and the cpu part is...
> Dear @shewu-quic @cccclai, > > does [PR 3196](https://github.com/pytorch/executorch/pull/3196) resolve the issue #2590? If so, I will close the issue. Thank you in advance! Thanks for the update and sending...
Are you trying to lower the model to CoreML by passing `--coreml`? We're still actively working on enabling llama2 7b with CoreML. The xnnpack backend is ready for llama2 7b...
xnnpack (https://github.com/google/XNNPACK) is a software library with a list highly optimized operators in CPU. It can work on iOS too. Regarding CoreML questions, I'd defer to @cymbalrush and @YifanShenSZ to...
@Jack-Khuu is the on-device evaluation ready? edit: Acutally coreml should be able to run on Mac too, @antmikinka are you looking for on device evaluation, or just evaluate the coreml...
I think it's related to how we expect eval to work with delegated model, in this case coreml
I check out this pr and run ``` git submodule sync git submodule update --init ./backends/apple/mps/install_requirements.sh python -m examples.models.llama2.export_llama -kv --mps ``` but still can't repro...
Have it working with this patch ``` diff --git a/backends/apple/mps/partition/mps_partitioner.py b/backends/apple/mps/partition/mps_partitioner.py index e5497389d..8e22169c0 100644 --- a/backends/apple/mps/partition/mps_partitioner.py +++ b/backends/apple/mps/partition/mps_partitioner.py @@ -43,12 +43,6 @@ class MPSOperatorSupport(OperatorSupportBase): self.edge_program = edge_program def is_node_supported(self, submodules,...