Koan-Sin Tan

Results 251 comments of Koan-Sin Tan

MMLU (0-shot, 5-shot): - @freedomtan try test 0-shot, 1-shot and find out the context and sequence length.

running MMLU with `lm_eval` | num_fewshot| context length| |--|--| |0| < 1024 | |1| could > 1024| |5| could > 3072| with something like ``` python -m executorch.examples.models.llama.eval_llama -c "${LLAMA_CHECKPOINT:?}"...

> evaluated with `lm_eval --model hf --model_args pretrained=meta-llama/... --tasks mmlu --num_fewshot 5` on Colab (w/ L4 GPU) > > model MMLU (5-shot) > 3.2 1B Instruct 0.4557 ± 0.0041 >...

[NNAPI is now considered to be formally deprecated](https://developer.android.com/ndk/guides/neuralnetworks). @dylanzika-google: will Google update the Pixel backend, which uses NNAPI.

On Pixel 10 Pro XL, NNAPI isn't working well.

@swasson488 @petermattson FYI: Pixel 10 numbers are clearly worse than Pixel 9 numbers. We need Google colleagues to fix it.

Why Pixel 10 numbers are not good. 1. for Mobilenet V4 and EDSR, fully-delegated, but clearly slower 2. for MobileDet, more ops not delegated (on Pixel 9, only the post...

with TFLite CLI benchmark_model: `benchmark_model --graph=${MODEL_PATH} --use_nnapi=1 --nnapi_allow_fp16=1 --enable_op_profiling=1`

for performance (and quick): tinymmlu (or ifeval maybe) for accuracy: tinymmly and ifeval

> [@freedomtan](https://github.com/freedomtan) I found a way to use different dataset files per `normal`, `quick`, and `rapid` runs, but they all have to use the same dataset (AKA `mlperf::QuerySampleLibrary`). Please let...