sherlock-project
sherlock-project copied to clipboard
KeyError when running model.predict(X_test) in 02-1-train-and-test-sherlock.ipynb
Hello!
I am trying to use the pre-built 'sherlock' model to make predictions. As suggested in the readme, I have run some of the cells in the 02-1-train-and-test-sherlock.ipynb file but get a KeyError when model.predict(X_test)
is run.
Code to Reproduce:
model_id = 'sherlock'
from ast import literal_eval
from collections import Counter
from datetime import datetime
import numpy as np
import pandas as pd
from sklearn.metrics import f1_score, classification_report
from sherlock.deploy.model import SherlockModel
start = datetime.now()
print(f'Started at {start}')
X_test = pd.read_parquet('../data/processed/X_test.parquet')
y_test = pd.read_parquet('../data/raw/test_labels.parquet').values.flatten()
y_test = np.array([x.lower() for x in y_test])
print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')
start = datetime.now()
print(f'Started at {start}')
model = SherlockModel();
model.initialize_model_from_json(with_weights=True, model_id="sherlock");
print('Initialized model.')
print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')
predicted_labels = model.predict(X_test)
predicted_labels = np.array([x.lower() for x in predicted_labels])
When model.predict(X_test)
is run the following KeyError occurs:
KeyError Traceback (most recent call last)
/var/folders/66/cbb21km104n7d7t9qf61q8rmrsjdc8/T/ipykernel_21846/2316637303.py in <module>
----> 1 predicted_labels = model.predict(X_test)
2 predicted_labels = np.array([x.lower() for x in predicted_labels])
~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict(self, X, model_id)
118 Array with predictions for X.
119 """
--> 120 y_pred = self.predict_proba(X, model_id)
121 y_pred_classes = helpers._proba_to_classes(y_pred, model_id)
122
~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict_proba(self, X, model_id)
141 y_pred = self.model.predict(
142 [
--> 143 X[feature_cols_dict["char"]].values,
144 X[feature_cols_dict["word"]].values,
145 X[feature_cols_dict["par"]].values,
KeyError: "['n_[^]-agg-sum', 'n_[^]-agg-max', 'n_[\\\\]-agg-kurtosis', 'n_[^]-agg-var', 'n_[\\\\]-agg-median', 'n_[^]-agg-kurtosis', 'n_[\\\\]-agg-mean', 'n_[\\\\]-agg-all', 'n_[^]-agg-min', 'n_[\\\\]-agg-sum', 'n_[^]-agg-median', 'n_[^]-agg-mean', 'n_[^]-agg-all', 'n_[\\\\]-agg-min', 'n_[\\\\]-agg-max', 'n_[^]-agg-any', 'n_[\\\\]-agg-var', 'n_[\\\\]-agg-any', 'n_[^]-agg-skewness', 'n_[\\\\]-agg-skewness'] not in index"
Is there something that I am missing or need to do prior to running the above code?
Appreciate the help!
@lowecg @madelonhulsebos would you mind providing some guidance, please?
Hi Kenton,
Sorry for the delay - I missed your original post. I'll have a look at this in the morning.
To get a lay of the land:
It sounds like you've initialised the project and just run 02-1-train-and-test-sherlock.ipynb
? Was that all you ran?
Could you confirm what version of Python you're running?
Cheers,
Chris.
Hi @KentonParton,
Apologies for my late response but I plan to take a look at this tomorrow.
@lowecg, I believe I’ve encountered this issue before, but will let you know if the issue is unknown to me..
Best, Madelon
Hi @KentonParton,
I ran your code and it works for me once I use the test data file that was created by running the notebook 01-data-processing.ipynb
(this file is named test.parquet
). Did you generate X_test.parquet
with that as well? What does it contain? Its head should be as follows:
data:image/s3,"s3://crabby-images/0c861/0c861ccc0171cb36e3b956206d33b61c3cbf1939" alt="Screenshot 2022-04-23 at 10 29 09"
If you just want to test the model with some custom input, I recommend using the notebook: 00-use-sherlock-out-of-the-box.ipynb
.
Hi @KentonParton, did you solve the issue?