Add evaluation configs under phi3 dir
Context
tracker: https://github.com/pytorch/torchtune/issues/1810 What is the purpose of this PR? Is it to
- [ ] add a new feature
- [ ] fix a bug
- [ ] update tests and/or documentation
- [x] clean up
Please link to any issues this PR addresses.
Changelog
What are the changes made in this PR?
- Copied evaluation.yaml to Phi3/ directory
- Updated evaluation.yaml to point to Phi3 2b model instantiations
- Updated the recipe registry to pick up the new config
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
- [x] run pre-commit hooks and linters (make sure you've first installed via
pre-commit install) - [ ] add unit tests for any new functionality
- [x] update docstrings for any new or updated methods or classes
- [ ] run unit tests via
pytest tests - [ ] run recipe tests via
pytest tests -m integration_test - [x] manually run any new or modified recipes with sufficient proof of correctness
- [x] include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)
Phi3 Eleuther eval recipe output:
(torchtune) Abdullahs-MacBook-Pro:Phi-3-mini-4k-instruct abdullah$ tune run eleuther_eval --config phi3/evaluation
W1012 00:32:22.842000 8517586752 torch/distributed/elastic/multiprocessing/redirects.py:28] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:torchtune.utils._logging:Running EleutherEvalRecipe with resolved config:
batch_size: 8
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Phi-3-mini-4k-instruct/models--microsoft--Phi-3-mini-4k-instruct/snapshots/0a67737cc96d2554230f90338b163bc6380a2a85
checkpoint_files:
- model-00001-of-00002.safetensors
- model-00002-of-00002.safetensors
model_type: PHI3_MINI
output_dir: /tmp/Phi-3-mini-4k-instruct/models--microsoft--Phi-3-mini-4k-instruct/snapshots/0a67737cc96d2554230f90338b163bc6380a2a85
recipe_checkpoint: null
device: cpu
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
_component_: torchtune.models.phi3.phi3_mini
quantizer: null
resume_from_checkpoint: false
seed: 1234
tasks:
- truthfulqa_mc2
tokenizer:
_component_: torchtune.models.phi3.phi3_mini_tokenizer
max_seq_len: null
path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model
INFO:torchtune.utils._logging:Converting Phi-3 Mini weights from HF format.Note that conversion of adapter weights into PEFT format is not supported.
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 769kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 106kB/s]
vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.16MB/s]
merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 1.26MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 1.90MB/s]
model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548M/548M [03:43<00:00, 2.45MB/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 2.14MB/s]
README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.59k/9.59k [00:00<00:00, 16.1MB/s]
validation-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 271k/271k [00:00<00:00, 1.36MB/s]
Generating validation split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 817/817 [00:00<00:00, 45492.21 examples/s]
INFO:torchtune.utils._logging:Running evaluation on the following tasks: ['truthfulqa_mc2']
INFO:lm-eval:Building contexts for truthfulqa_mc2 on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 817/817 [00:00<00:00, 2446.11it/s]
INFO:lm-eval:Running loglikelihood requests
Running loglikelihood requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 5882/5882 [22:42:29<00:00, 13.90s/it]
INFO:torchtune.utils._logging:Eval completed in 81751.63 seconds.
INFO:torchtune.utils._logging:Max memory allocated: 0.00 GB
INFO:torchtune.utils._logging:
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|-----:|------|---|-----:|---|-----:|
|truthfulqa_mc2| 2|none | 0|acc |↑ |0.5456|± |0.0151|
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1822
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit 604d2e6cdb1c8f366a589d43986f3eea231b7130 with merge base 78ceee6786d577866edd42fb678260c25690834e ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@Harthi7 Would you mind merging main branch and running the linter? Then, we can go ahead and get this merged :)
@Harthi7 nice, thank you! Please see linting instructions here to take care of lint related failure, https://github.com/pytorch/pytorch/blob/main/CONTRIBUTING.md#local-linting
We actually use separate linting tools from pytorch core : ) see here https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md#coding-style
@Harthi7 nice, thank you! Please see linting instructions here to take care of lint related failure, https://github.com/pytorch/pytorch/blob/main/CONTRIBUTING.md#local-linting
We actually use separate linting tools from pytorch core : ) see here https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md#coding-style
@SalmanMohammadi aha, nice to know :) Thank you so much for sharing!
Hello @joecummings and @RdoubleA, I merged with main and ran the lint command, Please review the changes and let me know if there is anything I have missed