Robert Sachunsky
Robert Sachunsky
Not sure if this is the right place for a discussion, but IMO this is _not_ the right approach for efficient prediction yet. We should define a tf.data pipeline, allowing...
Sorry, in my previous comment I was thinking more about Eynollah than the Binarizer (hence the heavy CPU part). And @apacha's PR does already speed up by an order of...
tf.data pipelining with heavy CPU processing itself seems to be hard to get right: to get true parallelisation, one probably needs [tfaip](https://github.com/Planet-AI-GmbH/tfaip#data-pipeline-1)...
What I describe happens on TF 2.13.1, which should be fully supported. This issue is a show-stopper for me, as with OCR-D, it's not even possible to keep the results...
Spoiler: I know how to do this. Would you care for a PR?
> > But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file > >...
BTW I believe having a measurement of CER standard deviation or variance is also useful. See [here](https://github.com/ASVLeipzig/cor-asv-ann/blob/0ae6867eba39f73f5832b219f09f71788145d1c2/ocrd_cor_asv_ann/lib/alignment.py#L414-L433) for an implementation.
Also, I wonder if this is even needed – #48 already covers prediction of a directory...
> > @cneud, yes, the issue can be solved with substitutions which can be configured by the users. > > Exactly. I would like to point out here that allowing...
> I just want to throw in some doubt on the belief that CERs are somehow comparable when produced by different tools. Do they count whitespace the same way? grapheme...