litbank icon indicating copy to clipboard operation
litbank copied to clipboard

Adding Conll format with POS tag and Lemmatization

Open alabrashJr opened this issue 4 years ago • 0 comments

BIO format has been transformed to BILOU format within conll format using (bio2biluo.py)[https://github.com/ufal/acl2019_nested_ner/blob/master/bio2bilou.py].

Lemmatization and POS tags have been done using(UDPipe) (http://ufal.mff.cuni.cz/udpipe)

The columns are as follows.

  • FORM: Word form or punctuation symbol.
  • LEMMA: Lemma or stem of word form.
  • XPOS: Language-specific part-of-speech tag; underscore if not available.
  • Labels joined with bar

alabrashJr avatar Nov 19 '21 13:11 alabrashJr