Auto-CORPus
Auto-CORPus copied to clipboard
Optional split of headers into separate passages
Headers are contained in the infons of a passage in the BioC file, this means they do not have a text or annotation field and cannot be visualised/annotated.
Potential solution: add in optional input to have headers as separate passages.
Potential issues: this will result in the pytest to fail as the files will be different.
Current state: PR raised that implements the infons.type for document title (using title_1 instead of front), passages (paragraph), and reference (ref). Can be expanded for headers with 3 different types: title_1, title_2, title_3. As well as for other items: table, table_caption, table_foot, table_footnote, fig_caption.