Udacity-Computer-Vision-Nanodegree-Program icon indicating copy to clipboard operation
Udacity-Computer-Vision-Nanodegree-Program copied to clipboard

Split the data into training and testing

Open vidhi-mody opened this issue 5 years ago • 3 comments

80, 20 would be a good ratio

vidhi-mody avatar Jul 04 '20 18:07 vidhi-mody

Suggestion: While performing the train test split use a seed so that when you rerun the code, you get the same splitting. Also, see if the function you are using to split has the option of stratifying the data. If you use sklearn, then it gives you that option. Stratification is necessary while splitting the data in multiclass classification because there may be a possibility that while splitting the majority of some class goes into test/train and hence the opposite (train/test) do not have the appropriate samples of that particular class. Stratification makes sure that the data distributions in both train and test remain the same. You can go through this blog for a detailed understanding: https://towardsdatascience.com/3-things-you-need-to-know-before-you-train-test-split-869dfabb7e50

ankurbhatia24 avatar Jul 04 '20 19:07 ankurbhatia24

I would like to work on this issue !

deepeshgarg09 avatar Jul 11 '20 17:07 deepeshgarg09

@deepeshgarg09 sure!

vidhi-mody avatar Jul 11 '20 17:07 vidhi-mody