makemore icon indicating copy to clipboard operation
makemore copied to clipboard

remove duplicate words from the dataset

Open iamdoron opened this issue 3 years ago • 0 comments

hi

thanks for your videos, just finished to watch the first part

when I tried to intersect between the test & train datasets I noticed some names repeat in the dataset

len(words) - len(list(set(words))) # 2539

it might create a bias in the test results and an additional small bias during training

iamdoron avatar Dec 07 '22 10:12 iamdoron