Mostelk comments

Results 39 comments of


                                            Mostelk

"Accuracy" metric for LLM model(s)

This paper https://arxiv.org/pdf/2208.03299 also has interesting code base that may be easier for integration than lm-eval or tiny lm eval, just focus on the zero-shot cases for our case: https://github.com/facebookresearch/atlas?tab=readme-ov-file#tasks...

"Accuracy" metric for LLM model(s)

> > This paper https://arxiv.org/pdf/2208.03299 also has interesting code base that may be easier for integration than lm-eval or tiny lm eval, just focus on the zero-shot cases for our...

"Accuracy" metric for LLM model(s)

> Let us try to quantize these and report accuracy llama 3.1 8B Instruct, llama 3.2 3B Instruct We will use MMLU (5 shot) to report accuracies after quantizing these...

"Accuracy" metric for LLM model(s)

Let us check mmlu-llama benchmark, 0 and 5 shots, we also need to decide on input & output sequence lengths

"Accuracy" metric for LLM model(s)

How about we use perplexity to measure the accuracy, similar to this ExecuTorch example for Llama 3.1 8B: using LM_EVAL, and using similar settings in this example of max input...

"Accuracy" metric for LLM model(s)

> How about we use perplexity to measure the accuracy, similar to this ExecuTorch example for Llama 3.1 8B: using LM_EVAL, and using similar settings in this example of max...

Check if we can leverage Google Play's "Play for On-device AI"

@swasson488 We would like your help on this, given we putting app on playstore

Check if we can leverage Google Play's "Play for On-device AI"

Also Stable Diffusion is not now accelerated on Pixel Phones, needs to be quantized and delegated to edge TPU

Feat. Benchmark Sets

@farook-edev @anhappdev would like to test, but we don;t have download link for the reference models, benchmark_setting { benchmark_id: "llm" framework: "TFLite" delegate_choice: { delegate_name: "CPU" accelerator_name: "cpu" accelerator_desc: "CPU"...