Clemens Neudecker comments

Results 137 comments of


                                            Clemens Neudecker

Support optional stopword list

> would only count 1 error (lazy vs lazer). Exactly. Any words appearing in the GT and also in the stopword list are ignored when computing the "significant words" accuracy...

Storing information on used processor parameters in the METS and PAGE

I am fine with what @bertsky proposed in https://github.com/OCR-D/spec/issues/108#issuecomment-503147346 too.

add 1st draft line GT/training specs

@wrznr So far I mainly applied [k-fold_cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation), would you still see added benefits over this by partitioning into three sets?

add 1st draft line GT/training specs

@wrznr Do your remaining ``@wrznr requested changes`` relate to [this comment](https://github.com/OCR-D/spec/pull/105#pullrequestreview-197476416) only or is there other stuff that needs changing (for the time being)?

Relation of METS and PAGE ReadingOrder

This was also a topic in [Europeana Newspapers](http://www.europeana-newspapers.eu/). See e.g. http://www.primaresearch.org/publications/ICDAR2013_Clausner_ReadingOrder http://www.europeana-newspapers.eu/wp-content/uploads/2015/05/D5.3_Final_release_ENMAP_1.0.pdf

Relation of METS and PAGE ReadingOrder

This is only awaiting the updated guidelines, right? #80 is closed and I agree fully with https://github.com/OCR-D/spec/issues/40#issuecomment-421994713. For the main purposes of OCR-D we should avoid (modifying) the depths of...

ocrd-tool schema: be less restrictive on input/ouptut_filegrp

It seems this went circle, but towards the general question of relaxing input/output fileGrp conventions: I would be fine with any form of relaxation that will still allows us to...

Metadata for OCR models and/or OCR model training sets

Just to let you know that I've been told today that [PMML](http://dmg.org/pmml/v4-3/GeneralStructure.html) is the widely accepted standard to describe ML models. It is XML-based. Perhaps we can learn/borrow some things...

Metadata for OCR models and/or OCR model training sets

@wrznr @kba @Doreenruirui This is pretty progressed https://github.com/Doreenruirui/okralact/tree/master/docs, https://github.com/Doreenruirui/okralact/tree/master/engines/schemas, no?

allow intermediate PAGE annotation for word segmentation ambiguity

Ping @splet @chris1010010 for further opinions.