Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

Optional split of headers into separate passages

Open jmp111 opened this issue 11 months ago • 0 comments

Headers are contained in the infons of a passage in the BioC file, this means they do not have a text or annotation field and cannot be visualised/annotated.

Potential solution: add in optional input to have headers as separate passages.

Potential issues: this will result in the pytest to fail as the files will be different.

Current state: PR raised that implements the infons.type for document title (using title_1 instead of front), passages (paragraph), and reference (ref). Can be expanded for headers with 3 different types: title_1, title_2, title_3. As well as for other items: table, table_caption, table_foot, table_footnote, fig_caption.

jmp111 avatar Feb 14 '25 12:02 jmp111