this-word-does-not-exist icon indicating copy to clipboard operation
this-word-does-not-exist copied to clipboard

Updated approx_pos method in dataset.py

Open hendrixjoseph opened this issue 2 years ago • 0 comments

Updated ParsedDictionaryDefinitionDataset approx_pos method in dataset.py.

Something must've changed in how a stanza.models.common.doc.Word is structured, causing the method def approx_pos(cls, nlp, sentence, lookup_idx, lookup_len): to fail.

The Word object now looks something like:

{
  "id": 6,
  "text": "a",
  "upos": "DET",
  "xpos": "DT",
  "feats": "Definite=Ind|PronType=Art",
  "start_char": 23,
  "end_char": 24
}

The plus side of this is that the start_char and end_char can now be extracted without using regex.

I've tested the change in Google Colab.

hendrixjoseph avatar May 06 '22 13:05 hendrixjoseph