Koan-Sin Tan comments

Results 238 comments of


                                            Koan-Sin Tan

Master issue: LLM Benchmark

> I've attempted to generate an int8 quantized model, but no matter how much I tried, the output was garbled and nothing like the fp16 one. > > I should...

Master issue: LLM Benchmark

Gemma 3 could be another candidate https://blog.google/technology/developers/gemma-3/

Master issue: LLM Benchmark

please 1. test the 3b model 2. check out exactly what the quantization by the ai-edge-torch does. 3. use "standard" tflite quantization tool to quantize model.

Master issue: LLM Benchmark

@freedomtan to check if he can make the llama 3 quantized 1b work, too.

Master issue: LLM Benchmark

@farook-edev I tested the quantized llama 3.2 1b it tflite model just now (March 26th, 2025) on a Mac mini m4. It worked as expected. 1. create a python venv...

Master issue: LLM Benchmark

Check if we can run the MMLU on Android. Mostly, we may need to use the [TinyMMLU](https://huggingface.co/datasets/tinyBenchmarks/tinyMMLU) because even if we can run the MMLU, it takes a lot of...

from client working group - performance metrics: time-to-fist-token, tokens-per-second (excluding the first token, decoding). - 4 categories - context length: 4K (trying to increase to 8K, how about for mobile...

Master issue: LLM Benchmark

It turns out running MMLU or tinyMMLU on Android with instruct-tuned models is quite trivial. Formatting the questions properly as input prompts then we can get expected results. For example,...

Master issue: LLM Benchmark

@anhappdev please help on check ExecuTorch https://github.com/pytorch/executorch, https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md

Master issue: LLM Benchmark

- out-of-memory, quantized 1B, w/ XNNPACK - w/o XNNPACK: gibberish - source: from ai-edge-torch @freedomtan try to see if he can build the ai-edge-torch example for Android. And test the...