ner-annotator
ner-annotator copied to clipboard
JSON generated by NER Annotator doesn't seem to work with Spacy convertor
See comment at https://github.com/tecoholic/ner-annotator/discussions/43#discussioncomment-2917902
Hello there!
A simple function to convert generated by ner-annotator JSON directly to docbin would be this one:
from spacy.tokens import DocBin
import spacy
import json
from tqdm import tqdm
import random
nlp = spacy.blank("en")
def load_data(file):
with open(file, "r", encoding="utf-8") as f:
data = json.load(f)
return (data["annotations"])
train_data = load_data("./data/annotation_1.json")
valid_data = load_data("./data/annotation_3.json")
def create_training(TRAIN_DATA):
db = DocBin()
for text, annot in tqdm(TRAIN_DATA):
doc = nlp.make_doc(text)
ents = []
for start, end, label in annot["entities"]:
span = doc.char_span(start, end, label=label,
alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents
db.add(doc)
return (db)
train_data = create_training(train_data)
train_data.to_disk("./data/train2.spacy")
valid_data = create_training(valid_data)
valid_data.to_disk("./data/valid2.spacy")
PS Good job for the app, I love it.
@MikhailKlemin Thank you for coming up with the solution.
I am facing issues in saving it to disk .spacy file, what to do ? Thanks in advance
I am facing issues in saving it to disk .spacy file, what to do ? Thanks in advance
Resolved!!
Hey @ankitladva11! Glad to know your problem was resolved. When you have the time, could you please leave a comment describing your issue and how you managed to resolve it? It would be useful to future users who might face the same issue. TIA!
@dreji18 also has a nicely documented approach to getting Spacy to work with the NER Annotator export.