Eric Kafe
Eric Kafe
@Higgs32584, I don't know the full problem scope, there could be more... Neither do I know the best place to do the substitution, but I have verified that it works...
The cause of the problem is that the two last lines under ENDING_QUOTE are handling contractions, using a regular expression that requires the contraction to be followed by a plain...
Thanks @Higgs32584, this looks good. Test cases are always much appreciated everywhere.
@alvations and @53X, a more consistent interpretation of pos=None could be nice, but in that case, the default should not be "n", but rather "Any pos". Please consider the morphy()...
Ideally, to get a consistent behaviour across the Wordnet Morphy-related wrappers, "WordNetLemmatizer.lemmatizer()" could just be an alias for the morphy() wrapper from wordnet.py. Actually, I find that the name "WordNetLemmatizer"...
[PR #3225]( https://github.com/nltk/nltk/pull/3225#issuecomment-1890890747) proposes to add two standard "morphy" modes to the WordNetLemmatizer class, for users who want a standard _morphy_ lemmatizer with a more consistent pos argument. On the...
Yes @ndvbd,, the "use_morphy" argument is not even in the latest NLTK version, though was proposed in [issue 18](https://github.com/nltk/wordnet/pull/18#issue-513042920l), and sounds like a good idea...
Actually, the problem is not whether or not to use morphy, but rather to prevent morphy from recursively stripping the same suffix many times. PR #3225 fixes it.
_word_tokenize_ also fails to split contractions followed by [\a\b\v].
This needs more work in order to return the lemmas of the synset targets, as in the last example from [this comment](https://github.com/nltk/nltk/issues/1970#issue-301709671) by @marcevrard. Alternatively, depending on the consensus, the...