deep-rules Use training/tuning/testing dataset terminology

Have you checked the list of proposed rules to see if the rule has already been proposed?

[x] Yes

Did you add yourself as a contributor by making a pull request if this is your first contribution?

[x] Yes, I added myself or am already a contributor

Feel free to elaborate, rant, and/or ramble.

I would suggest the use of training/tuning/testing rather than training/validating/testing or (God forbid) training/testing/validating to avoid any confusion in different domains. I see inconsistent use of this terminology in existing issues and draft text. See #19 for an example.

Any citations for the rule? (peer-reviewed literature preferred but not required)

https://twitter.com/michaelhoffman/status/989251677646704641
https://twitter.com/michaelhoffman/status/989972609042395136
https://twitter.com/CarldeBoerPhD/status/989318174431764480

Dec 06 '18 21:12 michaelmhoffman

Big fan of this change 👍 . It'll be nice to get away from two different field-specific vocabularies that intersect painfully in bioinformatics.

Dec 07 '18 13:12 cgreene

What label should an issue like this have? "meta" or something else?

Dec 07 '18 13:12 michaelmhoffman

What label should an issue like this have? "meta" or something else?

I agree with "meta". It is proposing terminology to use consistently as opposed to a new rule or discussion of a specific paper.

However, I'm not convinced that we want to adopt training/tuning/testing terminology. It is a more accurate description of how the datasets are used, but proposing uncommon terminology could confuse our target audience when they read other machine learning and deep learning literature (obligatory https://twitter.com/michaelhoffman/status/989977986471514112).

There are certainly different uses of validation and test set terminology across domains, but within machine learning most sources are consistent. The Wikipedia article in your tweet above has many sources, and the random textbooks I grabbed (Kevin Murphy and Christopher Bishop's ML books) also use training/validation/test.

Dec 07 '18 15:12 agitter

I'm a fan of training/validating/testing myself, but I would be happy with whatever doesn't confuse readers. I think we should come to a consensus for consistency and then just mention in the text, "Here, we use a/b/c, but we note this terminology is interchangeable with x/y/z."

Dec 07 '18 15:12 pstew

I think that train/validate/test is the more commonly associated name, but I agree that train/tune/test would make it more intuitive what each set is for. I haven't seen many people actually use train/tune/test in practice, and I agree with @agitter that proposing as a rule some uncommon terminology may not be the best idea.

Dec 07 '18 17:12 jmschrei

The problem is that validation is common terminology for what comes after testing for many people in biomedical research. You can argue that using validation for what comes before testing is more correct but that doesn't eliminate the confusion.

Some people say December 7, 2018 should be written 12/7/2018. Others say it should be written 7/12/2018. There is ample historical precedent in different communities for both, and you can argue until you're blue in the face about which one is correct or makes more sense. Or you can switch to the unambiguous 2018-12-07.

Dec 07 '18 20:12 michaelmhoffman

Perhaps the rule here should be that, when discussing these splits, one should make sure to define how you are using each one clearly enough that people from another field can follow?

Dec 07 '18 21:12 jmschrei

The problem is that validation is common terminology for what comes after testing for many people in biomedical research. You can argue that using validation for what comes before testing is more correct but that doesn't eliminate the confusion.

yeah, technically, both a validation and a test set are actually used for "validation"/evaluation, just at different stages of the pipeline. From a language perspective, I would say that test and validation set mean the same thing -- also, in k-fold cross-validation, we usually refer to training and "test" folds.

I like the term "tuning dataset" though, because it disambiguates it a bit for practitioners who are not familiar with jargon in ML.

Maybe the best of both worlds is to use the "official jargon" but explain it well. This will then also help people with reading ML literature. I.e., we could say that we split the dataset into three parts, a training, validation, and test set ... the validation set is used to evaluate the model performance during training and model selection and thus can be regarded as a "tuning dataset." In contrast, the test set shall only be used once, after the tuning has completed, to evaluate the final performance of the model (if we only use the test set once, it is an unbiased estimator of the generalization performance).

Dec 07 '18 21:12 rasbt

I'm okay using training/tuning/testing if we also prepare readers for the terms they will encounter in machine learning or biomedical literature. There is precedent for training/tuning/testing in machine learning. The earliest reference I found (so far) is from Readings in Machine Learning:

Dec 07 '18 22:12 agitter

I've never heard of the terms training/tuning/testing until just now, only training/validating/testing. With that being said, I am 100% a fan of training/tuning/testing because:

Alliteration
It actually makes intuitive sense

I am personally going to use that terminology in my own research going forward, so thank you @michaelmhoffman for introducing it to me.

Dec 08 '18 03:12 Benjamin-Lee

For the record @hugoaerts was already using it here before I arrived 😄.

Dec 08 '18 03:12 michaelmhoffman