python-machine-learning-book-2nd-edition icon indicating copy to clipboard operation
python-machine-learning-book-2nd-edition copied to clipboard

AttributeError: Can't get attribute 'tokenizer' on <module '__main__'>

Open NielsHoogeveen1990 opened this issue 6 years ago • 6 comments

Hi,

I am trying to test the pickled objects, to verify that I can import the vectorizer and unpickle the classifier.

import re
import os
from vectorizer import vect

clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))

However, I get this error:

----> 6 clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))

AttributeError: Can't get attribute 'tokenizer' on <module 'main'>

What is going on and how can I fix this error?

Thank you!

NielsHoogeveen1990 avatar Oct 12 '18 10:10 NielsHoogeveen1990

Hm, this looks like some namespace issue. Pickle is very sensitive about that. Do you have the tokenizer defined in the vectorizer.py file, like so?

   def tokenizer(text):
       text = re.sub('<[^>]*>', '', text)
       emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)',
                              text.lower())
       text = re.sub('[\W]+', ' ', text.lower()) \
                     + ' '.join(emoticons).replace('-', '')
       tokenized = [w for w in text.split() if w not in stop]
       return tokenized


   vect = HashingVectorizer(decode_error='ignore',
                            n_features=2**21,
                            preprocessor=None,
                            tokenizer=tokenizer)

rasbt avatar Oct 12 '18 15:10 rasbt

Also have this issue. Can confirm vectorizer.py looks like that.

raybellwaves avatar Jul 21 '19 00:07 raybellwaves

For what it's worth, doing:

>>> import pickle
>>> import re
>>> import os
>>> from vectorizer import *
>>> clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))
>>> clf

Seemed to work for me. Thanks to digging around here https://stackoverflow.com/questions/40287657/load-pickled-object-in-different-file-attribute-error#comment67835396_40287657

raybellwaves avatar Jul 21 '19 02:07 raybellwaves

Ah think I know the source of my issue. I was pickling the clf after the logistic regression model. In the textbook it has the pickling after the SGDCClassifier.

raybellwaves avatar Jul 21 '19 03:07 raybellwaves

Thanks for the comments. Hm, that's weird, the SGDClassifier should behave exactly the same as the LogisticRegression classifier when it comes to pickling. I think in most cases, it's usually a namespace issue. Hope you were able to resolve it.

rasbt avatar Jul 21 '19 11:07 rasbt

Also I realized I picked the Pipeline object by pickling the earlier classifier... https://github.com/rasbt/python-machine-learning-book-2nd-edition/blob/faaacc278924bf3ff0f27b56cc8962a94e90d0f4/code/ch08/ch08.py#L485

raybellwaves avatar Jul 27 '19 00:07 raybellwaves