Codes-for-WSDM-CUP-Music-Rec-1st-place-solution icon indicating copy to clipboard operation
Codes-for-WSDM-CUP-Music-Rec-1st-place-solution copied to clipboard

label-encoder encoding missing values

Open cheershuaizhao opened this issue 7 years ago • 1 comments

In line 54 in id_process.py. It gets the following error. Traceback (most recent call last): File "", line 8, in File "/home/shuai/anaconda3/envs/python27/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 133, in transform raise ValueError("y contains new labels: %s" % str(diff)) ValueError: y contains new labels: [nan nan nan ..., nan nan nan]

I guess the reason is because we should replace missing values before using label encoders, according to this post: https://stackoverflow.com/questions/36808434/label-encoder-encoding-missing-values I added two lines and it works fine: train[column].replace(np.nan, 'NAN', inplace=True) test[column].replace(np.nan, 'NAN', inplace=True)

cheershuaizhao avatar Dec 30 '17 21:12 cheershuaizhao

ValueError Traceback (most recent call last) in () 4 column_encoder = LabelEncoder() 5 column_encoder.fit(train[column].append(test[column])) ----> 6 train[column] = column_encoder.transform(train[column]) 7 test[column] = column_encoder.transform(test[column]) 8

/Users/xm/cs231/lib/python2.7/site-packages/sklearn/preprocessing/label.pyc in transform(self, y) 151 if len(np.intersect1d(classes, self.classes_)) < len(classes): 152 diff = np.setdiff1d(classes, self.classes_) --> 153 raise ValueError("y contains new labels: %s" % str(diff)) 154 return np.searchsorted(self.classes_, y) 155

ValueError: y contains new labels: [nan nan nan ..., nan nan nan]

同样的事情, 我也遇到了, 按照我的理解,参考作者之其他地方关于nan的处理, 应该是如果遇到nan是留在原地,不处理的。搞成“NAN”, 后续的groupby的影响,是什么样的,我是没能充分理解的, 稍后还请作者,释疑解惑。

我从sklearn 0.19.1 降到0.18.1 还是遇到这个label_encoder 遇到nan 崩盘的问题, work arounds 都很丑, 一个是转map, column apply(map), 另外一个是subtable 搞出来, transform 然后在赋值回去。

Anyway, excellent work,I love it!!!

YangChaoKiKa avatar Mar 01 '18 08:03 YangChaoKiKa