Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

Added dynamic section restructuring for non-PMC articles.

Open Thomas-Rowlands opened this issue 2 years ago • 1 comments

Sections from journals such as Human Molecular Genetics were not cleanly contained separately, instead just having their contents listed all as siblings throughout the main text.

This update restructures matches that display this type of layout, then restructures them into empty parent divs more akin to PMC articles. The rest of the AC code base can then treat matches in the same manner as before effectively.

White space removal has been commented out for now until a few kinks are worked out.

Thomas-Rowlands avatar Jun 30 '23 13:06 Thomas-Rowlands

Re-using this PR since it is based on the same branch of my fork. This now includes a merge of all existing branches within my fork (see changes above).

Main changes Improved handling of poor HTML structuring in other journal articles. Reworked BioC tables code from months back (see above) Up-to-date supplementary material processing changes XML output for BioC tables

Thomas-Rowlands avatar Dec 05 '23 20:12 Thomas-Rowlands