After training my own classifier with nltk, how do I load it in textblob?
The built-in classifier in textblob is pretty dumb. It's trained on movie reviews, so I created a huge set of examples in my context (57,000 stories, categorized as positive or negative) and then trained it using nltk. I tried using textblob to train it but it always failed:
with open('train.json', 'r') as fp:
cl = NaiveBayesClassifier(fp, format="json")
That would run for hours and end in a memory error.
I looked at the source and found it was just using nltk and wrapping that, so I used that instead, and it worked.
The structure for nltk training set needed to be a list of tuples, with the first part was a Counter of words in the text and frequency of appearance. The second part of tuple was 'pos' or 'neg' for sentiment.
>>> train_set = [(Counter(i["text"].split()),i["label"]) for i in data[200:]]
>>> test_set = [(Counter(i["text"].split()),i["label"]) for i in data[:200]] # withholding 200 examples for testing later
>>> cl = nltk.NaiveBayesClassifier.train(train_set) # <-- this is the same thing textblob was using
>>> print("Classifier accuracy percent:",(nltk.classify.accuracy(cl, test_set))*100)
('Classifier accuracy percent:', 66.5)
>>>>cl.show_most_informative_features(75)
Then I pickled it.
with open('storybayes.pickle','wb') as f:
pickle.dump(cl,f)
Now... I took this pickled file, and re opened it to get the nltk.classifier 'nltk.classify.naivebayes.NaiveBayesClassifier'> -- and tried to feed it into textblob. Instead of
from textblob.classifiers import NaiveBayesClassifier
blob = TextBlob("I love this library", analyzer=NaiveBayesAnalyzer())
I tried:
>>> import cPickle as pickle
>>> with open('storybayes.pickle','rb') as f:
cl = pickle.load(f)
>>> type(cl)
<class 'nltk.classify.naivebayes.NaiveBayesClassifier'>
>>> from textblob import TextBlob
>>> blob = TextBlob("I love this library", classifier=cl)
>>> blob
TextBlob("I love this library")
>>> blob.classify()
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
blob.classify()
File "C:\python\lib\site-packages\textblob\blob.py", line 412, in classify
return self.classifier.classify(self.raw)
File "C:\python\lib\site-packages\nltk\classify\naivebayes.py", line 88, in classify
return self.prob_classify(featureset).max()
File "C:\python\lib\site-packages\nltk\classify\naivebayes.py", line 94, in prob_classify
featureset = featureset.copy()
AttributeError: 'str' object has no attribute 'copy'
I also posted this here: https://stackoverflow.com/questions/50828262/after-training-my-own-classifier-with-nltk-how-do-i-load-it-in-textblob