deep-rules icon indicating copy to clipboard operation
deep-rules copied to clipboard

Understand the data

Open khyu opened this issue 7 years ago • 5 comments

Before hitting the data with fancy DL algorithms, always spend some time to understand how the data were generated, how the samples are selected, and what were the assumptions in the data generation and data cleaning process.

khyu avatar Nov 03 '18 22:11 khyu

This applies to data science in general, not just DL.

khyu avatar Nov 03 '18 22:11 khyu

I was going to add an additional rule but it may fit here. Have a hypothesis of why deep learning would work. e.g. In an image there is structural information so recognizing local features and pooling works well in a CNN. In structured data without explicit ordering, why would we expect a CNN to work.

brettbj avatar Nov 12 '18 22:11 brettbj

I see my comment applies more accurately to #22

brettbj avatar Nov 13 '18 03:11 brettbj

Implicitly mentioned in https://github.com/Benjamin-Lee/deep-rules/blob/master/content/06.know-your-problem.md but might want to make more explicit, especially @brettbj's comment about having a hypothesis for why DL might work for your data.

fmaguire avatar Feb 22 '19 00:02 fmaguire

Yeah. Unfortunately, there's no real guideline (similar to classic machine learning where there is no hard recommendation for which feature engineering approach should be used), we could maybe really give some rule-of-thumb advice to make this more tractable. I.e., I imagine the main audience will be researchers who are not familiar with DL (yet) and wonder: "would DL help with my problem?" We could say sth along this lines that if we have a large, unstructured dataset in a raw form, usually text or image data, DL could potentially be useful as it could be able to automatically extract features where it is not obvious for a human. Or sth. like that.

rasbt avatar Feb 22 '19 23:02 rasbt