Eric Kafe comments

Results 106 comments of


                                            Eric Kafe

fix for word_tokenize() Failing to Split English Contractions When Followed by [\t\n\f\r]

@Higgs32584, I don't know the full problem scope, there could be more... Neither do I know the best place to do the substitution, but I have verified that it works...

fix for word_tokenize() Failing to Split English Contractions When Followed by [\t\n\f\r]

The cause of the problem is that the two last lines under ENDING_QUOTE are handling contractions, using a regular expression that requires the contraction to be followed by a plain...

fix for word_tokenize() Failing to Split English Contractions When Followed by [\t\n\f\r]

Thanks @Higgs32584, this looks good. Test cases are always much appreciated everywhere.

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize()

@alvations and @53X, a more consistent interpretation of pos=None could be nice, but in that case, the default should not be "n", but rather "Any pos". Please consider the morphy()...

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize()

Ideally, to get a consistent behaviour across the Wordnet Morphy-related wrappers, "WordNetLemmatizer.lemmatizer()" could just be an alias for the morphy() wrapper from wordnet.py. Actually, I find that the name "WordNetLemmatizer"...

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize()

[PR #3225]( https://github.com/nltk/nltk/pull/3225#issuecomment-1890890747) proposes to add two standard "morphy" modes to the WordNetLemmatizer class, for users who want a standard _morphy_ lemmatizer with a more consistent pos argument. On the...

Wordnet synsets query problem

Yes @ndvbd,, the "use_morphy" argument is not even in the latest NLTK version, though was proposed in [issue 18](https://github.com/nltk/wordnet/pull/18#issue-513042920l), and sounds like a good idea...

Wordnet synsets query problem

Actually, the problem is not whether or not to use morphy, but rather to prevent morphy from recursively stripping the same suffix many times. PR #3225 fixes it.

word_tokenize() Failed to Split English Contractions When Followed by [\t\n\f\r]

_word_tokenize_ also fails to split contractions followed by [\a\b\v].

Make WordNet's synset relations available from the lemmas

This needs more work in order to return the lemmas of the synset targets, as in the last example from [this comment](https://github.com/nltk/nltk/issues/1970#issue-301709671) by @marcevrard. Alternatively, depending on the consensus, the...