mlframework
mlframework copied to clipboard
y contains previously unseen labels
After executing this command "sh run.sh randomforest", I got the following error messages.
Traceback (most recent call last): File "/opt/anaconda3/envs/kaggle/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/anaconda3/envs/kaggle/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/ekaratrattagan/Program/Course/machine_learning/kaggle/e01/src/train.py", line 45, in
train_df.loc[:, c] = lbl.transform(train_df[c].values.tolist()) File "/opt/anaconda3/envs/kaggle/lib/python3.8/site-packages/sklearn/preprocessing/_label.py", line 273, in transform _, y = encode(y, uniques=self.classes, encode=True) File "/opt/anaconda3/envs/kaggle/lib/python3.8/site-packages/sklearn/preprocessing/_label.py", line 117, in _encode return _encode_numpy(values, uniques, encode, File "/opt/anaconda3/envs/kaggle/lib/python3.8/site-packages/sklearn/preprocessing/_label.py", line 49, in _encode_numpy raise ValueError("y contains previously unseen labels: %s" ValueError: y contains previously unseen labels: [nan, nan, nan, nan, nan, nan, nan, nan, ....
I then fixed it by adding the following two lines, train_df[c].replace(np.nan, 'NAN', inplace=True) valid_df[c].replace(np.nan, 'NAN', inplace=True) ,after for c in train_df.columns: and before lbl = preprocessing.LabelEncoder() in train.py
label_encoders = {} for c in train_df.columns: train_df[c].replace(np.nan, 'NAN', inplace=True) valid_df[c].replace(np.nan, 'NAN', inplace=True) lbl = preprocessing.LabelEncoder()
After that, it worked perfectly.