NeuralGym icon indicating copy to clipboard operation
NeuralGym copied to clipboard

hello. i have a question on how to make train data. (NER)

Open NuealYoon opened this issue 5 years ago • 1 comments

I was looking at the train_data.txt file to train the model.

("""It's a visually stunning movie, finding moments both macro and micro to highlight the beautiful imagination that "Star Wars" can evoke.""", { 'words': ['It', "'s", 'a', 'visually', 'stunning', 'movie', ',', 'finding', 'moments', 'both', 'macro', 'and', 'micro', 'to', 'highlight', 'the', 'beautiful', 'imagination', 'that', '"', 'Star', 'Wars', '"', 'can', 'evoke', '.'], 'entities': [(25, 30, 'PRODUCT'), (114, 123, 'WORK_OF_ART')], 'heads': [1, 1, 5, 4, 5, 1, 1, 1, 7, 10, 8, 10, 10, 14, 7, 17, 17, 14, 24, 24, 21, 24, 23, 24, 17, 1], 'deps': ['nsubj', 'ROOT', 'det', 'advmod', 'amod', 'attr', 'punct', 'advcl', 'dobj', 'preconj', 'amod', 'cc', 'conj', 'aux', 'advcl', 'det', 'amod', 'dobj', 'mark', 'punct', 'compound', 'nsubj', 'punct', 'aux', 'relcl', 'punct'], 'tags': ['PRP', 'VBZ', 'DT', 'RB', 'JJ', 'NN', ',', 'VBG', 'NNS', 'CC', 'JJ', 'CC', 'JJ', 'TO', 'VB', 'DT', 'JJ', 'NN', 'IN', '``', 'NNP', 'NNS', "''", 'MD', 'VB', '.'], 'cats': {'POSITIVE': True, 'NEGATIVE': False} })

  1. What does the number mean in 'entities'?

  2. Do you have a document to read what'heads','deps','tags' and'cats' are?

 Thanks for reading.

NuealYoon avatar Jul 23 '20 07:07 NuealYoon

  1. numbers in 'entities' mean character offsets within the sentence.
  2. you will find your answers in spacy documentation / training

d5555 avatar Aug 13 '20 12:08 d5555