disambiguate icon indicating copy to clipboard operation
disambiguate copied to clipboard

Use modified version of WordNet

Open mrmechko opened this issue 5 years ago • 2 comments

If I wanted to use my own modified version of WordNet, or perhaps a different hierarchy with this system, where would I start?

I notice that the java code uses JWI, but I'm trying to figure out if the core system actually needs the full wordnet hierarchy or just the tags.

mrmechko avatar Nov 25 '19 19:11 mrmechko

Hi !

So, it's true that we currently rely a lot on the WordNet hierarchy. If you want to use another sense inventory, here are some tips:

  • No need to change the Python code, all things WordNet-related are located in Java files.
  • Four Java classes need to be changed: NeuralWSDPrepare (the main) and NeuralDataPreparator are in charge of preparing the training data and configuring the neural network, and NeuralWSDDecode and NeuralDisambiguator are in charge of using a trained neural network to decode new text.
  • Track the usage of classes WordnetHelper and WordnetUtils, they are in charge of everything WordNet-related.
  • In general, we use WordNet for the following: list the possible senses for a word considering its lemma (so the neural network will predict a sense among these possibilities only), converting senses to synsets and/or to compressed synset (as in our article).

I know that it would be great to have a clear interface, to use any sense inventory, and it's not too difficult, but I don't have the time to do the changes right now, however it's planned for 2020 (after I finish my PhD actually ^^). If you want to work on it, I would be glad to take pull requests :) I think the best way to achieve this would be to replace all "WordNetStuff" by a generic "SenseInventoryStuff", so the code stays globally the same, and we will then provide different implementation of the SenseInventory.

loic-vial avatar Nov 26 '19 10:11 loic-vial

Hi, fellow PhD student here, hoping to finish in 2020 too.

I decided to sidestep the issue for now by using sense compression. The TRIPS ontology has mappings from WordNet, so I'm just replacing the hypernym compression algorithm with TRIPS compression. That does violate the invariant that was described in the paper (that no compression should result in losing a unique wordsense) but it seems to be working pretty well.

mrmechko avatar Nov 26 '19 20:11 mrmechko