ner-annotator
ner-annotator copied to clipboard
Generate MsgPack export/import
We should try to reimplement the msgPack format from spacy. https://msgpack.org/ should be helpful. Maybe also implement import.
I think the current format that spacy uses for NER data is DocBin. I don't know if there is a open spec that will allow reading and writing this format. Maybe reading the spacy code will help.
Either way, I don't see a big need for msgpack.
The DocBin format is a gzipped MsgPack https://spacy.io/api/docbin
@leonkunert Ah.. I should have RTFD. Thanks for pointing out. Then this is something that should be definitely implemented.
The token, spaces and lengths fields can be difficult. They are serialized numpy arrays.