streusle
streusle copied to clipboard
Format extension: incorporating annotator notes?
The version of STREUSLE in Xposition contains some annotator notes on P tokens that are not included in the official release. The notes can help clarify the interpretation of the text, provide the annotator's rationale, or help cluster different usages at a finer level of granularity than the supersenses.
Should the .conllulex format have a place for these? An extra column? Or maybe a sentence header row, as they are rare?
Should there also be a standard for releasing rich annotation history metadata (such as who annotated which token, original vs. adjudicated annotations, timestamps, ...)?
Maybe notes should be in a standoff TSV format (similar to tquery.py output) that gets ingested into the JSON?
Distinguish token notes (tnote), lexical expression notes (lnote), sentence notes (snote)?
Allow notes for arbitrary subsets of a sentence's tokens (e.g. "this was considered but rejected as an MWE")?