deep-rules icon indicating copy to clipboard operation
deep-rules copied to clipboard

Perform sanity checks and follow good coding practices

Open evancofer opened this issue 7 years ago • 8 comments

Have you checked the list of proposed rules to see if the rule has already been proposed?

  • [x] Yes

Did you add yourself as a contributor by making a pull request if this is your first contribution?

  • [x] Yes, I added myself or am already a contributor

Feel free to elaborate, rant, and/or ramble. When coding DL models, it is important to maintain good software engineering practices. All code should be documented and include rigorous tests. Sanity checks are also useful. For instance, something is probably wrong (e.g. bug in code, ill posed problem, bad hyperparameters) if model training loss does not decrease (i.e. not overfitting) when considering a very small subset of the training data.

Any citations for the rule? (peer-reviewed literature preferred but not required)

  • https://arxiv.org/pdf/1206.5533v2.pdf
  • http://cs231n.github.io/neural-networks-3/#sanitycheck

evancofer avatar Nov 21 '18 01:11 evancofer

This is slightly similar to #49 , #35 , and #21

evancofer avatar Nov 21 '18 01:11 evancofer

I think it would be good to have your suggestion as s a separate rule, but it is also somewhat connected #42, the fact that we need to usually have a larger/more extensive model selection part when using deep learning as opposed to "traditional" machine learning. In addition, we need to "more babysit" the different model fitting procedures and evaluating the internal procedure ("does it converge?") vs just the external metrics ("what is the prediction accuracy?")

rasbt avatar Nov 21 '18 01:11 rasbt

Indeed. In many cases however, sanity checks need to occur before hyperparameter optimization occurs.

evancofer avatar Nov 21 '18 02:11 evancofer

All code should be documented and include rigorous tests.

With respect to this, I just published a paper on the exact topic that might be worth citing. I'm happy to expand further on documentation best practices for DL.

Benjamin-Lee avatar Dec 23 '18 10:12 Benjamin-Lee

I also recommend Top considerations for creating bioinformatics software documentation for software documentation

agitter avatar Dec 23 '18 20:12 agitter

@Benjamin-Lee Congrats! I think this and the paper linked by @agitter would be great to cite here.

pstew avatar Dec 23 '18 21:12 pstew

Those all look like pretty relevant citations, and we should definitely keep them in mind as we draft. IIRC this discussion of testing & other software engineering best practices was going to go into Tip # 1 (deep learning is still machine learning), but maybe it should go somewhere else if we intend to discuss certain aspects (e.g. testing) in more detail?

evancofer avatar Dec 24 '18 22:12 evancofer

Sort of mentioned in tip 3, need to add this reference to it.

fmaguire avatar Feb 21 '19 22:02 fmaguire