mobile_app_open icon indicating copy to clipboard operation
mobile_app_open copied to clipboard

LLM Dataset Implementation

Open farook-edev opened this issue 5 months ago • 7 comments

This issue is a general container for matters relating to datasets in general. Discussions on TinyMMLU or IFEval specifically should go in the sub issues for this one.

Current Status

TinyMMLU

  • [x] Dataset is converted from .parquet to .tfrecord via a utility script.
  • [x] Dataset loads .tfrecord and stores data inside samples.
  • [x] Dataset provides samples by id to driver/backend in proper format.*
  • [x] Dataset Processes output from driver/backend.*
  • [x] Dataset calculates and provides accuracy using output data on device.

IFEval

  • [x] Dataset is converted from .jsonl to .tfrecord via a utility script.
  • [x] Dataset loads .tfrecord and stores data inside samples.
  • [x] Dataset provides samples by id to driver/backend in proper format.*
  • [x] Dataset Processes output from driver/backend.*
  • [x] Dataset calculates and provides accuracy using output data on device.

* This includes tokenization/detokenization using common SentencePiece utility code.

farook-edev avatar Oct 01 '25 23:10 farook-edev