Fairly validate your model with proper test/train data
Have you checked the list of proposed rules to see if the rule has already been proposed?
- [X] Yes
Feel free to elaborate, rant, and/or ramble.
Any citations for the rule? (peer-reviewed literature preferred but not required)
- DOI
Could be merged as is proposed elsewhere but I feel that this is important enough to be its own rule
I agree, this is a fairly important one! On the flip-side, this is not unique to Deep Learning and applied basically to machine learning in general. Nonetheless, I feel like we should mention it explicitly. But to make this more DL-specific for these "10 rules for DL" we maybe want to think about a more specific title for this rule.
I.e., something along the lines of that while large test sets may be sufficiently reliable estimators of the generalization performance, we still need to ensure test sets remain independent.
Nice to see models validated in external, publicly available datasets generated by different labs. Potentially even across platforms (e.g. microarray vs RNAseq). Biological reproducibility!
It is also good to think about the potential applications that users might find for your model, and then validate/invalidate those applications if possible. For instance, validating variant effect prediction of TF binding models across many kinds of variants (e.g. indels, SNPs) and not just looking at accuracy of TF binding predictions.
Completely agree will all.
How might we rephrase this to make it more DL-specific? Or rather should we leaveit more ML in general and add DL-specific caveats as we explain?
I really like Evan's examples.
On Wed, Nov 21, 2018 at 10:16 AM Evan Cofer [email protected] wrote:
It is also good to think about the potential applications that users might find for your model, and then validate/invalidate those applications if possible. For instance, validating variant effect prediction of TF binding models across many kinds of variants (e.g. indels, SNPs) and not just looking at accuracy of TF binding predictions.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Benjamin-Lee/deep-rules/issues/27#issuecomment-440723874, or mute the thread https://github.com/notifications/unsubscribe-auth/AGCLqFOB3JJe5fTyu884z2HWNzpdLJPGks5uxXxkgaJpZM4YYka6 .
-- MARC CHEVRETTE Currie Lab & Department of Genetics University of Wisconsin-Madison M | 401.269.9173 E | [email protected]
Regarding performance metrics etc. just want to throw in this reference:
- Korotcov A, Tkachenko V, Russo DP, Ekins S: Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. Molecular Pharmaceutics 2017, 14:4462–4475. https://pubs.acs.org/doi/10.1021/acs.molpharmaceut.7b00578