sematch icon indicating copy to clipboard operation
sematch copied to clipboard

Extension to other POS Taxonomies Beyond Nouns

Open ejshieh opened this issue 7 years ago • 2 comments

First off, thank you for building sematch! This package has been incredibly valuable for me.

Suggestion / question - is there any reason why WordNetSimilarity is restricted to only nouns at the moment? I noticed that synsets seem to be restricted to nouns only, but WordNet includes verb taxonomies also.

Relevant code:

  • Default parameter wn.NOUN here: https://github.com/gsi-upm/sematch/blob/master/sematch/semantic/similarity.py#L232
  • word_similarity does not override the default parameter: https://github.com/gsi-upm/sematch/blob/master/sematch/semantic/similarity.py#L334-L335 (in fact, it appears that all callers of word2synset assume that the input word is a noun)

nltk.corpus.wordnet.synsets doesn't require the POS argument to be passed in (it defaults to nltk.corpus.wordnet.POS_LIST, so I think a potentially nice extension would be to remove the restriction on measuring similarity between nouns only

ejshieh avatar Feb 16 '18 10:02 ejshieh

An example of what this looks like right now:

>>> from sematch.semantic.similarity import WordNetSimilarity
>>> wns = WordNetSimilarity()
>>> wns.word_similarity('sit', 'lounge')
0

ejshieh avatar Feb 16 '18 10:02 ejshieh

I'm not the original author, but I agree with you. I believe there is no technical reason for this, there just wasn't a use case that needed similarity between other POS.

The change is fairly straightforward, I will get back to it once we achieve Py3 compatibility.

balkian avatar Oct 26 '20 21:10 balkian