doccano-transformer
doccano-transformer copied to clipboard
Not writing all entities in to_conll2003
How to reproduce the behaviour
I can't share the data because its confidential but some entities simply aren't written when using that function over documents!
Having pushed a little the analysis following the loss of many data, I realized that there were spaces (or -) included at the beginning or end of annotation empeding a correct tokenization and the associated labeling.