torchtune Add evaluation configs under phi3 dir

Context

tracker: https://github.com/pytorch/torchtune/issues/1810 What is the purpose of this PR? Is it to

[ ] add a new feature
[ ] fix a bug
[ ] update tests and/or documentation
[x] clean up

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Copied evaluation.yaml to Phi3/ directory
Updated evaluation.yaml to point to Phi3 2b model instantiations
Updated the recipe registry to pick up the new config

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

[x] run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
[ ] add unit tests for any new functionality
[x] update docstrings for any new or updated methods or classes
[ ] run unit tests via pytest tests
[ ] run recipe tests via pytest tests -m integration_test
[x] manually run any new or modified recipes with sufficient proof of correctness
[x] include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

Phi3 Eleuther eval recipe output:

(torchtune) Abdullahs-MacBook-Pro:Phi-3-mini-4k-instruct abdullah$ tune run eleuther_eval --config phi3/evaluation
W1012 00:32:22.842000 8517586752 torch/distributed/elastic/multiprocessing/redirects.py:28] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:torchtune.utils._logging:Running EleutherEvalRecipe with resolved config:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Phi-3-mini-4k-instruct/models--microsoft--Phi-3-mini-4k-instruct/snapshots/0a67737cc96d2554230f90338b163bc6380a2a85
  checkpoint_files:
  - model-00001-of-00002.safetensors
  - model-00002-of-00002.safetensors
  model_type: PHI3_MINI
  output_dir: /tmp/Phi-3-mini-4k-instruct/models--microsoft--Phi-3-mini-4k-instruct/snapshots/0a67737cc96d2554230f90338b163bc6380a2a85
  recipe_checkpoint: null
device: cpu
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
  _component_: torchtune.models.phi3.phi3_mini
quantizer: null
resume_from_checkpoint: false
seed: 1234
tasks:
- truthfulqa_mc2
tokenizer:
  _component_: torchtune.models.phi3.phi3_mini_tokenizer
  max_seq_len: null
  path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model

INFO:torchtune.utils._logging:Converting Phi-3 Mini weights from HF format.Note that conversion of adapter weights into PEFT format is not supported.
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 769kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 106kB/s]
vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.16MB/s]
merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 1.26MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 1.90MB/s]
model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548M/548M [03:43<00:00, 2.45MB/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 2.14MB/s]
README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.59k/9.59k [00:00<00:00, 16.1MB/s]
validation-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 271k/271k [00:00<00:00, 1.36MB/s]
Generating validation split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 817/817 [00:00<00:00, 45492.21 examples/s]
INFO:torchtune.utils._logging:Running evaluation on the following tasks: ['truthfulqa_mc2']
INFO:lm-eval:Building contexts for truthfulqa_mc2 on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 817/817 [00:00<00:00, 2446.11it/s]
INFO:lm-eval:Running loglikelihood requests
Running loglikelihood requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 5882/5882 [22:42:29<00:00, 13.90s/it]
INFO:torchtune.utils._logging:Eval completed in 81751.63 seconds.
INFO:torchtune.utils._logging:Max memory allocated: 0.00 GB
INFO:torchtune.utils._logging:

|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|-----:|------|---|-----:|---|-----:|
|truthfulqa_mc2|      2|none  |     0|acc   |↑  |0.5456|±  |0.0151|

Oct 12 '24 21:10 Harthi7

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1822

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 604d2e6cdb1c8f366a589d43986f3eea231b7130 with merge base 78ceee6786d577866edd42fb678260c25690834e (): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Oct 12 '24 21:10 pytorch-bot[bot]

@Harthi7 Would you mind merging main branch and running the linter? Then, we can go ahead and get this merged :)

Oct 14 '24 13:10 joecummings

@Harthi7 nice, thank you! Please see linting instructions here to take care of lint related failure, https://github.com/pytorch/pytorch/blob/main/CONTRIBUTING.md#local-linting

We actually use separate linting tools from pytorch core : ) see here https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md#coding-style

Oct 14 '24 14:10 salmanmohammadi

@Harthi7 nice, thank you! Please see linting instructions here to take care of lint related failure, https://github.com/pytorch/pytorch/blob/main/CONTRIBUTING.md#local-linting

We actually use separate linting tools from pytorch core : ) see here https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md#coding-style

@SalmanMohammadi aha, nice to know :) Thank you so much for sharing!

Oct 14 '24 14:10 spzala

Hello @joecummings and @RdoubleA, I merged with main and ran the lint command, Please review the changes and let me know if there is anything I have missed

Oct 14 '24 16:10 Harthi7