deep-rules icon indicating copy to clipboard operation
deep-rules copied to clipboard

Overall discussion for Tip 4

Open SiminaB opened this issue 4 years ago • 7 comments

This is to discuss outstanding issues for Tip 4: Know your data and your question. https://github.com/Benjamin-Lee/deep-rules/blob/master/content/06.know-your-problem.md

SiminaB avatar Oct 07 '20 18:10 SiminaB

  • I'm not sure the data simulation portion belongs here:

Data simulation is a powerful approach to develop an understanding of how data and analytical methods interact. In data simulation, a model is used to learn the true distribution of a training set for the purpose of creating new data points. Often, researchers may perform simulations under some assumptions about the data generating process to identify useful model architectures and hyperparameters. Simulated datasets can be used to verify the correctness of a model’s implementation. To accurately test the performance of the model, it is important that simulated datasets be generated for a range of parameters. For example, varying the parameters to violate the model’s assumptions can test the sensitivity of the model’s performance. Parameter tuning the simulation can help researchers identify the key features that drive method performance. In other cases, neural networks can be used to simulate data to better understand how to structure analyses. For example, it is possible to study how analytical strategies cope with varying number of noise sources by using neural networks to simulate genome-wide data [24]. Simulating data from assumptions about the data generating distribution can help to debug or characterize deep learning models, and deep learning models can also simulate data in cases where it is hard to make reasonable assumptions from rst principles.

SiminaB avatar Oct 07 '20 18:10 SiminaB

Agreed. It still feels out of place, even after it was moved (from tip 1?) and further tweaked (e.g. #234). Not sure about where it fits best but open to suggestions.

pstew avatar Oct 07 '20 19:10 pstew

Simulating data from assumptions about the data generating distribution can help to debug or characterize deep learning models, and deep learning models can also simulate data in cases where it is hard to make reasonable assumptions from first principles.

First half of this sentence feels very hard to read.

signalbash avatar Oct 11 '20 22:10 signalbash

@pstew and @SiminaB the data simulation paragraph just jumped out at me for feeling out of place both content-wise and stylistically. I'm tempted to cut it down to a sentence or two and tuck it into an existing paragraph somewhere. The paper is already quite long and I don't think this paragraph is adding a ton of useful content.

Benjamin-Lee avatar Oct 17 '20 18:10 Benjamin-Lee

@Benjamin-Lee I agree. Go for it.

pstew avatar Oct 19 '20 13:10 pstew

@pstew review requested for #269

Benjamin-Lee avatar Oct 19 '20 20:10 Benjamin-Lee

@Benjamin-Lee Thanks! Approved!

pstew avatar Oct 20 '20 13:10 pstew