python-machine-learning-book-2nd-edition
python-machine-learning-book-2nd-edition copied to clipboard
AttributeError: Can't get attribute 'tokenizer' on <module '__main__'>
Hi,
I am trying to test the pickled objects, to verify that I can import the vectorizer and unpickle the classifier.
import re
import os
from vectorizer import vect
clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))
However, I get this error:
----> 6 clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))
AttributeError: Can't get attribute 'tokenizer' on <module 'main'>
What is going on and how can I fix this error?
Thank you!
Hm, this looks like some namespace issue. Pickle is very sensitive about that. Do you have the tokenizer defined in the vectorizer.py file, like so?
def tokenizer(text):
text = re.sub('<[^>]*>', '', text)
emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)',
text.lower())
text = re.sub('[\W]+', ' ', text.lower()) \
+ ' '.join(emoticons).replace('-', '')
tokenized = [w for w in text.split() if w not in stop]
return tokenized
vect = HashingVectorizer(decode_error='ignore',
n_features=2**21,
preprocessor=None,
tokenizer=tokenizer)
Also have this issue. Can confirm vectorizer.py
looks like that.
For what it's worth, doing:
>>> import pickle
>>> import re
>>> import os
>>> from vectorizer import *
>>> clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))
>>> clf
Seemed to work for me. Thanks to digging around here https://stackoverflow.com/questions/40287657/load-pickled-object-in-different-file-attribute-error#comment67835396_40287657
Ah think I know the source of my issue. I was pickling the clf
after the logistic regression model. In the textbook it has the pickling after the SGDCClassifier
.
Thanks for the comments. Hm, that's weird, the SGDClassifier should behave exactly the same as the LogisticRegression classifier when it comes to pickling. I think in most cases, it's usually a namespace issue. Hope you were able to resolve it.
Also I realized I picked the Pipeline
object by pickling the earlier classifier...
https://github.com/rasbt/python-machine-learning-book-2nd-edition/blob/faaacc278924bf3ff0f27b56cc8962a94e90d0f4/code/ch08/ch08.py#L485