numerai icon indicating copy to clipboard operation
numerai copied to clipboard

Issue with data prep

Open AIAdventures opened this issue 8 years ago • 3 comments

Hi Jim! Great project! I am just having trouble with the prep data moudule. Running it on linux mint.

andrewcz@andrewcz-PORTEGE-Z30t-B ~/Desktop/Numerai/numerai dataset/numerai_datasets (13)/numerai $ python prep_data.py /home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Fold #1 Traceback (most recent call last): File "prep_data.py", line 85, in main() File "prep_data.py", line 50, in main rf.fit(X_split_train, y_split_train) File "/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py", line 247, in fit X = check_array(X, accept_sparse="csc", dtype=DTYPE) File "/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: 'test'

Many thanks for your help, Andrew

AIAdventures avatar Apr 22 '17 04:04 AIAdventures

The data format has changed since last year. There are some columns that need to be dropped.

I used this in tournament 72: feature_cols = ['feature'+str(i) for i in range(1, 22)]

GillesVandewiele avatar Sep 10 '17 08:09 GillesVandewiele

Yes, this code is pretty out of date now. I may update in the future as time allows.

jimfleming avatar Sep 10 '17 17:09 jimfleming

Hey @jimfleming I adapted parts of your code to work with the current format. I'll try sending a PR in the nearby future!

GillesVandewiele avatar Sep 18 '17 11:09 GillesVandewiele