Codes-for-WSDM-CUP-Music-Rec-1st-place-solution
Codes-for-WSDM-CUP-Music-Rec-1st-place-solution copied to clipboard
label-encoder encoding missing values
In line 54 in id_process.py. It gets the following error.
Traceback (most recent call last):
File "", line 8, in
I guess the reason is because we should replace missing values before using label encoders, according to this post: https://stackoverflow.com/questions/36808434/label-encoder-encoding-missing-values I added two lines and it works fine: train[column].replace(np.nan, 'NAN', inplace=True) test[column].replace(np.nan, 'NAN', inplace=True)
ValueError Traceback (most recent call last)
/Users/xm/cs231/lib/python2.7/site-packages/sklearn/preprocessing/label.pyc in transform(self, y) 151 if len(np.intersect1d(classes, self.classes_)) < len(classes): 152 diff = np.setdiff1d(classes, self.classes_) --> 153 raise ValueError("y contains new labels: %s" % str(diff)) 154 return np.searchsorted(self.classes_, y) 155
ValueError: y contains new labels: [nan nan nan ..., nan nan nan]
同样的事情, 我也遇到了, 按照我的理解,参考作者之其他地方关于nan的处理, 应该是如果遇到nan是留在原地,不处理的。搞成“NAN”, 后续的groupby的影响,是什么样的,我是没能充分理解的, 稍后还请作者,释疑解惑。
我从sklearn 0.19.1 降到0.18.1 还是遇到这个label_encoder 遇到nan 崩盘的问题, work arounds 都很丑, 一个是转map, column apply(map), 另外一个是subtable 搞出来, transform 然后在赋值回去。
Anyway, excellent work,I love it!!!