layout-analysis: (re)train
Since there is no documentation here for the training process and the training data, we have to make guesses.
The current model for (logical / whole-page) layout-analysis contains 21 classes:
['annotation', 'binding', 'chapter', 'colour_checker', 'contained_work', 'contents', 'cover', 'edge', 'endsheet', 'epicedia', 'illustration', 'index', 'musical_notation', 'page', 'paste_down', 'preface', 'provenance', 'section', 'sermon', 'table', 'title_page']
This is clearly inadequate: it mixes very specialised, rare types (sermon) with coarse, frequent ones (page), also it is very unlikely that such fine differentiation is feasible just from the visual classification of pages, independent of each other (i.e. without sequence context). For example, how could the hierarchy levels chapter and section be discernable, reliably?
So IMO we should re-train this on a coarser set of types, say:
empty(covering all non-text divs likebinding,colour_checker,cover,endsheet)title_pagecontents(also includingindex)page.
Perhaps additionally discerning table, illustration and musical_notation pages is doable, but that may well be considered part of physical / structural layout analysis (as these region types rarely occur alone on a page).
Going back in the history, it is evident that the model has been trained on (an older version of) keras.applications.InceptionV3:
https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L62-L66
https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L73
https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L81-L82
https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L161-L165
https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L85-L88
So input seems to be 600x500px grayscale (1-channel), with a batch dimension in front.
It would help to know what training data was previously used, though.
@n00blet could you please comment?