Koan-Sin Tan
Koan-Sin Tan
*gemma 3 1b* | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| |-----------------|-------|----------------|-----:|-----------|---|-----:|---|------| |tinyBenchmarks | N/A| | | | | | | | | - tinyArc | 0|none |...
>  > > After some exploration, the use cases we are trying to enable (say summarization, context generation etc.,) are not properly captured by the datasets used in TinyBenchmark....
> [@freedomtan](https://github.com/freedomtan) From the tinyBenchmarks page, the ones I have marked as Single-Token had descriptions similar to single-token outputs (PFB). We have not run it, to be exactly sure. But...
How about quantized model from Meta folks, we know they are available on Huggingface too - https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 - https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 Well, they are not in Huggingface safetensor format, that is, we...
@mohitmundhragithub @Aswinoss and @Mostelk OpenOrca is a dataset, not a benchmark.
> This paper https://arxiv.org/pdf/2208.03299 also has interesting code base that may be easier for integration than lm-eval or tiny lm eval, just focus on the zero-shot cases for our case:...
as I said Meta's quantized llama 3.2 3B models could be evaluated with ExecuTorch code, With ```bash export LLAMA_DIR="/Users/freedom/.llama/checkpoints" export LLAMA_QUANTIZED_CHECKPOINT=${LLAMA_DIR}/"Llama3.2-3B-Instruct-int4-qlora-eo8/consolidated.00.pth" export LLAMA_PARAMS=${LLAMA_DIR}/"Llama3.2-3B-Instruct-int4-qlora-eo8/params.json" export LLAMA_TOKENIZER=${LLAMA_DIR}/"Llama3.2-3B-Instruct-int4-qlora-eo8/tokenizer.model" python -m executorch.examples.models.llama.eval_llama \ --model...
To get baseline numbers, llama 3.2 3B Instruction MMLU with `lm_eval` ```bash $ lm_eval --model hf --model_args pretrained=meta-llama/Llama-3.2-3B-Instruct --tasks mmlu --num_fewshot 5 ``` I got hf (pretrained=meta-llama/Llama-3.2-3B-Instruct), gen_kwargs: (None), limit:...
how does ExecuTorch's executorch.examples.models.llama.eval_llama work? mainly it calls the lm_eval's evaluator.simple_evaluate() see https://github.com/pytorch/executorch/blob/main/examples/models/llama/eval_llama_lib.py#L295-L320 and https://github.com/EleutherAI/lm-evaluation-harness/blob/8bc4afff22e73995883de41018388428e39f8a92/lm_eval/evaluator.py#L47
evaluated with `lm_eval --model hf --model_args pretrained=meta-llama/... --tasks mmlu --num_fewshot 5` on Colab (w/ L4 GPU) | model | MMLU (5-shot)| |-------| --------------| |3.2 1B Instruct| 0.4557 ± 0.0041| |3.2...