litbank
litbank copied to clipboard
Adding Conll format with POS tag and Lemmatization
BIO format has been transformed to BILOU format within conll format using (bio2biluo.py)[https://github.com/ufal/acl2019_nested_ner/blob/master/bio2bilou.py].
Lemmatization and POS tags have been done using(UDPipe) (http://ufal.mff.cuni.cz/udpipe)
The columns are as follows.
- FORM: Word form or punctuation symbol.
- LEMMA: Lemma or stem of word form.
- XPOS: Language-specific part-of-speech tag; underscore if not available.
- Labels joined with bar