dinglehopper icon indicating copy to clipboard operation
dinglehopper copied to clipboard

Generate per-workspace CER + WER

Open mikegerber opened this issue 5 years ago • 4 comments

  1. It's easy to calculate this from the individual CER/WER and the character/word counts.
  2. But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file

mikegerber avatar Oct 17 '20 14:10 mikegerber

But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file

@kba @bertsky @cneud Any thoughts on this?

mikegerber avatar Oct 17 '20 14:10 mikegerber

But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file

@kba @bertsky @cneud Any thoughts on this?

Yes, this became possible when we agreed on an "official" way to have global (document-wide) files in the METS. You just put it in there without a pageId (i.e. without a reference of the file in the structMap) and use a certain convention for the file ID (with FULLDOWNLOAD IIRC).

Now that different MIME types are allowed in fileGrps, having an output fileGrp with page-wise and document-global reports should be no problem.

bertsky avatar Oct 17 '20 17:10 bertsky

BTW I believe having a measurement of CER standard deviation or variance is also useful. See here for an implementation.

bertsky avatar Oct 30 '20 15:10 bertsky

(Closing the issue was an accident, I often hit the wrong buttons)

mikegerber avatar Oct 30 '20 16:10 mikegerber