1810 move gemma evaluation
Context
tracker: https://github.com/pytorch/torchtune/issues/1810 What is the purpose of this PR? Is it to
- [ ] add a new feature
- [ ] fix a bug
- [ ] update tests and/or documentation
- [x] clean up
Please link to any issues this PR addresses.
Changelog
What are the changes made in this PR?
- Copied evaluation.yaml to gemma/ directory
- Updated evaluation.yaml to point to gemma 2b model instantiations
- Updated the recipe registry to pick up the new config
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
- [x] run pre-commit hooks and linters (make sure you've first installed via
pre-commit install) - [ ] add unit tests for any new functionality
- [ ] update docstrings for any new or updated methods or classes
- [x] run unit tests via
pytest tests - [ ] run recipe tests via
pytest tests -m integration_test - [x] manually run any new or modified recipes with sufficient proof of correctness
- [x] include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)
gemma Eleuther eval recipe output:
(torchtune_rosetta) linjaboy@Mohammads-MacBook-Pro 9cf48e52b224239de00d483ec8eb84fb8d0f3a3a % tune run eleuther_eval --config gemma/evaluation
W1012 01:59:27.011000 8088854208 torch/distributed/elastic/multiprocessing/redirects.py:28] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:torchtune.utils._logging:Running EleutherEvalRecipe with resolved config:
batch_size: 8
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/gemma-2b/models--google--gemma-2b/snapshots/9cf48e52b224239de00d483ec8eb84fb8d0f3a3a
checkpoint_files:
- model-00001-of-00002.safetensors
- model-00002-of-00002.safetensors
model_type: GEMMA
output_dir: ./
device: cpu
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
_component_: torchtune.models.gemma.gemma_2b
quantizer: null
seed: 1234
tasks:
- truthfulqa_mc2
tokenizer:
_component_: torchtune.models.gemma.gemma_tokenizer
path: /tmp/gemma-2b/models--google--gemma-2b/snapshots/9cf48e52b224239de00d483ec8eb84fb8d0f3a3a/tokenizer.model
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
config.json: 100%|███████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 1.06MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 119kB/s]
vocab.json: 100%|████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.14MB/s]
merges.txt: 100%|██████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 1.57MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 2.14MB/s]
model.safetensors: 100%|███████████████████████████████████████████████████████████████████| 548M/548M [01:32<00:00, 5.95MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 712kB/s]
README.md: 100%|█████████████████████████████████████████████████████████████████████████| 9.59k/9.59k [00:00<00:00, 5.65MB/s]
validation-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████| 271k/271k [00:00<00:00, 2.79MB/s]
Generating validation split: 100%|████████████████████████████████████████████████| 817/817 [00:00<00:00, 26416.89 examples/s]
INFO:torchtune.utils._logging:Running evaluation on the following tasks: ['truthfulqa_mc2']
INFO:lm-eval:Building contexts for truthfulqa_mc2 on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████| 817/817 [00:00<00:00, 2121.83it/s]
INFO:lm-eval:Running loglikelihood requests
Running loglikelihood requests: 100%|███████████████████████████████████████████████████| 5882/5882 [8:26:40<00:00, 5.17s/it]
INFO:torchtune.utils._logging:Eval completed in 30404.48 seconds.
INFO:torchtune.utils._logging:Max memory allocated: 0.00 GB
INFO:torchtune.utils._logging:
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|-----:|------|---|-----:|---|-----:|
|truthfulqa_mc2| 2|none | 0|acc |↑ |0.3995|± |0.0152|
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1819
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit d1d12338ec7cf03444611910b9b5761e6698d52d with merge base 7744608c4c455a86b21f4ce0642e6c19bc1cebf4 ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
A couple of very small changes, but otherwise looks great!
Hey @joecummings thanks for the review. I have addressed these the comments now.