pattern icon indicating copy to clipboard operation
pattern copied to clipboard

Documents not sortable in py3

Open pachewise opened this issue 5 years ago • 1 comments

We're having an issue with pattern==3.6 where if there are duplicates, etc in the model documents, getting the nsmallest fails for vector_space_search:

from pattern.en import lexeme
from pattern.vector import Document, LEMMA, TFIDF, Model
responses = ['it is works great.  ', 'bristles are soft and compact enough', 'the aftertaste isnt as bad as others. ', 'i dont know. it isnt something i think about.', 'bristles are soft and compact enough']
exclude = ['t', 'im']
docs = [Document(response, stemmer=LEMMA, name=str(i), exclude=exclude, stopwords=False) for i, response in enumerate(responses)]
m = Model(documents=docs, weight=TFIDF)
results = m.search(words=lexeme('bristle'), top=100)

Results in:

image

(if you're wondering, here's why it works in py2 - from https://docs.python.org/2/library/stdtypes.html#comparisons) image

pachewise avatar Apr 13 '19 20:04 pachewise

See also #62

Bounty here: https://github.com/clips/pattern/issues/62#issuecomment-391473725

tuxayo avatar Feb 26 '20 22:02 tuxayo