axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Evaluate on specified data

Open Peter-Devine opened this issue 1 year ago • 3 comments

⚠️ Please check that this feature request hasn't been suggested before.

  • [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
  • [X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

I want to evaluate on data that may be distinct of the training data.

Currently, the evaluation data is a random sample of the training data, but I have a situation where I have a lot of training data from a slightly noisy Dataset A and then a very small amount of very high quality data from Dataset B.

I want to be able to train on Dataset A and evaluate on Dataset B.

✔️ Solution

When using a Huggingface dataset, it would be nice to use the actual validation set as the eval_dataset for training. This way, you could manually specify which data will be used in training and what will be used in validation.

I think some code would have to be refactored in https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/data.py

Thanks!

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this feature has not been requested yet.
  • [X] I have provided enough information for the maintainers to understand and evaluate this request.

Peter-Devine avatar Nov 17 '23 04:11 Peter-Devine

I would need this feature as well

codiceSpaghetti avatar Dec 28 '23 17:12 codiceSpaghetti

Any updates on this enhancement? Thanks!

JiyangZhang avatar Jan 08 '24 21:01 JiyangZhang

Bump. It would be really handy to be able to evaluate continuously on a specified dataset different from the training dataset so that we could control early stopping etc. based on the performance of a target task.

For example, if we are just training on unstructured text but evaluating on a small structured test dataset, this could help us find the optimal training amount of transfer learning for the target task.

Thanks.

Peter-Devine avatar Jan 18 '24 05:01 Peter-Devine

Hey, PR #786 allows for test_dataset: now. We also have bench_dataset if you want to run benchmarks (more info: https://github.com/OpenAccess-AI-Collective/axolotl/issues/311#issuecomment-2028311885).

NanoCode012 avatar Mar 30 '24 18:03 NanoCode012