Robert Sachunsky

Results 721 comments of Robert Sachunsky
trafficstars

Yes, it should be possible to skip pages marked as certain types in the logical structmap – not just in any one processor, but as a general mechanism for workflows...

> Should we take this to an OCR-D core or spec issue? Yes, we should elevate this to OCR-D/spec. > I have some additional thoughts to discuss (like: What happens...

Or maybe I should've read the documentation: So training would be via [sbb_pixelwise_segmentation](https://github.com/qurator-spk/sbb_pixelwise_segmentation), correct?

Thanks @vahidrezanezhad @cneud for getting back to me so quickly! > Those are fine and valid questions. Our current order of priorities is roughly like this: refactoring codebase -> OCR-D...

> Just clarify that we did have only latin scripted documents, so clearly it will not work for chinese or arabic ones. I meant that if in your document you...

My largest demand for a sanitizer would be ensuring ingest into Kitodo.Presentation / DFG-Viewer works. According to [this](https://github.com/kitodo/kitodo-presentation/issues/337#issuecomment-491839819) we are already close, but... - our ALTO must be v2.0 currently...

I stand corrected: As [this example](http://dfg-viewer.de/show/?tx_dlf%5Bid%5D=http%3A%2F%2Fdfgviewer.cloutodo.de%2FCCS-puS4EaAkXFrkaNhbF9sT%2F202105311521_20210531A%2F202105311521_20210531A.xml&no_cache=1%3E) by @stefanCCS – [METS](http://dfgviewer.cloutodo.de/CCS-puS4EaAkXFrkaNhbF9sT/202105311521_20210531A/202105311521_20210531A.xml) and [ALTO](http://dfgviewer.cloutodo.de/CCS-puS4EaAkXFrkaNhbF9sT/202105311521_20210531A/ALTO/00000001.xml) – shows, `MIMETYPE="application/alto+xml"` and ALTO v4.1 do work actually. (That is, newer features are simply ignored.)

Also: - if any `/PcGts/Page/ReadingOrder/(OrderedGroup|OrderedGroupIndexed)/@index` is not in order (or clashing) - if any `//TextEquiv/@index` is not in order (or clashing)

Practically I would not use backups for re-processing a pipeline anymore. But perhaps we should nevertheless keep the mechanism – exactly _because_ we will be using `overwrite` a lot. Also,...

Moreover, the current time counts measure the wall time, not the CPU time. We should at least complement this figure. (Somewhat confusingly, the CPU time is called `time.clock` in python,...