spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

"invalid whitespace entity spans" error while validation training and test data for NER

Open abrarsharif66 opened this issue 1 year ago • 1 comments

How to reproduce the behaviour

I have use the following piece of code to convert json to spacy while validationg using spacy --debug i get whitespace error:

image

please help me how to resolve this

for text, annot in tqdm(TRAIN_DATA['annotations']): doc = nlp.make_doc(text) ents = [] for start, end, label in annot["entities"]: span = doc.char_span(start, end, label=label, alignment_mode="contract") if span is None: print("Skipping entity") else: ents.append(span) doc.ents = ents db.add(doc) db.to_disk("train_data.spacy")

Info about spaCy

  • spaCy version: 3.7.5
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Pipelines: en_core_web_lg (3.7.1), en_core_web_sm (3.7.1)
  • Operating System:
  • Python Version Used:
  • spaCy Version Used:
  • Environment Information:

abrarsharif66 avatar Nov 11 '24 07:11 abrarsharif66

sample JSON file of my train data for better understanding of schema:

{"classes":["SOFTWARE_NAME","JOB_TYPE","EDUCATION","UNIVERSITY","DEGREE","YEARS_OF_EXPERIENCE","STATE","CITY","COUNTRY","PROGRAMING_CONCEPT","COMPANY_NAME","PROGRAMMING_LANGUAGE","FRAMEWORKS","SOFT_SKILLS","JOB_TITLE","NAME","EMAIL","PH.NO"],"annotations":[["Zixuan Wu [email protected]",{"entities":[[0,9,"NAME"],[10,27,"EMAIL"]]}],["1363 Briones Ct | Pleasanton, CA 94588 | (510) 676-7461",{"entities":[[41,55,"PH.NO"]]}]]}

abrarsharif66 avatar Nov 11 '24 07:11 abrarsharif66