Robert Sachunsky comments

Results 721 comments of


                                            Robert Sachunsky

trafficstars

Additional parameter for custom resolution

> Besides the manual DPI override, this would also allow supporting DPI meta-data validation with different levels of strictness. > > Or supporting automatic workspace validation with different levels/sets of...

Clone filter

> Any preferences on the command line interface? I fail to see the difference between 1 and 3. But I would prefer the `--not-*` scheme over `*-exclude/*-include`. What about `--not`...

Allow restricting the workspace validator to a subset of file groups.

Related to #506. **Also:** The newly exposed `ocrd validate page` would be much more useful if its file argument was multi-valued (accepting a list via shell pathname globbing): ```python @click.argument('page',...

resmgr: properly implement --overwrite, fix #690

> but I thought it would be less surprising if the `overwrite` case is handled by the download/copy method backends, because we have that pattern in other places too I...

change API to get page-level parallelization everywhere

I'd like to bring in another idea: Maybe file locking is the wrong way to think of page-level parallelization altogether. We have dependencies between successive processors in a workflow, so...

change API to get page-level parallelization everywhere

One thing we have to take into account are processors that run on multiple input file groups. For those, `process_page` will have to take a tuple of `OcrdFile`s – for...

change API to get page-level parallelization everywhere

@jbarth-ubhd > * would it be possible to parallelize single processing steps automatically within given bounds (CPU, RAM) Yes, that's what the above proposal is about. If we change the...

change API to get page-level parallelization everywhere

> I've meant a completely separate parallel workflow (on other data), if it does not make sense to assign all free resources to 1 single workflow (e. g. if most...

change API to get page-level parallelization everywhere

> > * would it be possible to parallelize single processing steps automatically within given bounds (CPU, RAM) > > Yes, that's what the above proposal is about. If we...

change API to get page-level parallelization everywhere

> Especially processors with TensorFlow seem to use all available cores (but very inefficiently). That makes efficient parallelization on the book level currently difficult or even impossible. > > Ideally...