ner-annotator icon indicating copy to clipboard operation
ner-annotator copied to clipboard

JSON generated by NER Annotator doesn't seem to work with Spacy convertor

Open tecoholic opened this issue 2 years ago • 6 comments

See comment at https://github.com/tecoholic/ner-annotator/discussions/43#discussioncomment-2917902

tecoholic avatar Jun 10 '22 02:06 tecoholic

Hello there!

A simple function to convert generated by ner-annotator JSON directly to docbin would be this one:

from spacy.tokens import DocBin
import spacy
import json
from tqdm import tqdm
import random

nlp = spacy.blank("en")


def load_data(file):
    with open(file, "r", encoding="utf-8") as f:
        data = json.load(f)
    return (data["annotations"])


train_data = load_data("./data/annotation_1.json")
valid_data = load_data("./data/annotation_3.json")


def create_training(TRAIN_DATA):
    db = DocBin()
    for text, annot in tqdm(TRAIN_DATA):
        doc = nlp.make_doc(text)
        ents = []
        for start, end, label in annot["entities"]:
            span = doc.char_span(start, end, label=label,
                                 alignment_mode="contract")
            if span is None:
                print("Skipping entity")
            else:
                ents.append(span)
        doc.ents = ents
        db.add(doc)
    return (db)


train_data = create_training(train_data)
train_data.to_disk("./data/train2.spacy")
valid_data = create_training(valid_data)
valid_data.to_disk("./data/valid2.spacy")

PS Good job for the app, I love it.

MikhailKlemin avatar Jun 17 '22 21:06 MikhailKlemin

@MikhailKlemin Thank you for coming up with the solution.

tecoholic avatar Jun 18 '22 18:06 tecoholic

I am facing issues in saving it to disk .spacy file, what to do ? Thanks in advance

ankitladva11 avatar Jun 22 '23 11:06 ankitladva11

I am facing issues in saving it to disk .spacy file, what to do ? Thanks in advance

Resolved!!

ankitladva11 avatar Jun 22 '23 11:06 ankitladva11

Hey @ankitladva11! Glad to know your problem was resolved. When you have the time, could you please leave a comment describing your issue and how you managed to resolve it? It would be useful to future users who might face the same issue. TIA!

alvi-khan avatar Jun 22 '23 15:06 alvi-khan

@dreji18 also has a nicely documented approach to getting Spacy to work with the NER Annotator export.

Annotate your data for NER Training 📣

DaanDeSmedt avatar Feb 28 '24 10:02 DaanDeSmedt