unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

fixed mmmu by cooking options from answer, when options is not given in the instance

Open dafnapension opened this issue 4 months ago • 0 comments

There are 30 cards in the group cards.mmmu.*, 16 of which (more than half) are erroneous: do not pass unitxt.api.load_dataset : mmmu_main.pdf

Exploring the original HF datasets, the following came up: mmmu_observations.pdf

  • answer=="?" if and only if the instance is in split test
  • the mmmu card effectively discards split test
  • there are 10500 instances on split test, 900 in validation (that the card takes to be its test split) and 150 in dev (that the card takes to be its train split).
  • field options (to become the choices) is empty in 53 validation instances and 9 dev instances and 627 test instances.
  • if not empty, options field is of length > 1, reaching up to 9, and answer indexes into it in the form of A,B,C..
  • only when options is empty, does field answer have an irregular value, that looks like the correct answer to question. The correct answer itself, and not the A B C to refer to it.

Therefore, the fix is as follows: for an instance with an empty options field, the card cooks an options field in the form of [answer], and then changes answer to read A.

This fixed all the errors:

fixed_mmmu.pdf

dafnapension avatar Aug 20 '25 09:08 dafnapension