Robert Sachunsky comments

Results 721 comments of


                                            Robert Sachunsky

trafficstars

performance of input_files on large workspaces

@MehmedGIT what's your status of the OcrdMets profiling experiment?

performance of input_files on large workspaces

@MehmedGIT I don't understand: the [benchmark-mets](https://github.com/OCR-D/core/tree/benchmark-mets) branch does not seem to contain any actual changes to the modules, only additional tests in `benchmarks` (i.e. outside the normal test set). Also,...

performance of input_files on large workspaces

@MehmedGIT I see, thanks.

Expose image region extraction code as CLI

But we first need the extraction Python API in core. Currently, it is only in `common` of OCR-D/ocrd_tesserocr, right? (related: OCR-D/ocrd_tesserocr#56)

Expose image region extraction code as CLI

Also, bashlib needs to offer more than just _extraction_: we also have to create PAGE-XML output with Olena binarization (referencing the new file in `AlternativeImage` and in the METS). So...

Expose image region extraction code as CLI

I fully agree. FYI, for Olena I am in the middle of a PR that will allow querying from and appending to PAGE's `AlternativeImage` with xmlstarlet – tentatively only one...

Expose image region extraction code as CLI

@kba, see bashlib-related FIXMEs in OCR-D/ocrd_olena#5

Expose image region extraction code as CLI

> can this be closed? I don't think so. We should at least re-visit. What we have now as a proof of concept in `ocrd-olena-binarize` and `ocrd-im6convert` is based on...

Expose image region extraction code as CLI

> If we had a [generic image processor in Python](https://github.com/OCR-D/core/issues/385), we could probably reduce the need for that shell API greatly. We do have that (as part of [ocrd_wrap](https://github.com/bertsky/ocrd_wrap)'s `ocrd-preprocess-image`),...

Expose image region extraction code as CLI

There's also ocrd_im6convert. And I don't think we should restrict bashlib in any way just because there are no more processors using it right now. In fact, I think it's...