Robert Sachunsky
Robert Sachunsky
> Besides the manual DPI override, this would also allow supporting DPI meta-data validation with different levels of strictness. > > Or supporting automatic workspace validation with different levels/sets of...
> Any preferences on the command line interface? I fail to see the difference between 1 and 3. But I would prefer the `--not-*` scheme over `*-exclude/*-include`. What about `--not`...
Related to #506. **Also:** The newly exposed `ocrd validate page` would be much more useful if its file argument was multi-valued (accepting a list via shell pathname globbing): ```python @click.argument('page',...
> but I thought it would be less surprising if the `overwrite` case is handled by the download/copy method backends, because we have that pattern in other places too I...
I'd like to bring in another idea: Maybe file locking is the wrong way to think of page-level parallelization altogether. We have dependencies between successive processors in a workflow, so...
One thing we have to take into account are processors that run on multiple input file groups. For those, `process_page` will have to take a tuple of `OcrdFile`s – for...
@jbarth-ubhd > * would it be possible to parallelize single processing steps automatically within given bounds (CPU, RAM) Yes, that's what the above proposal is about. If we change the...
> I've meant a completely separate parallel workflow (on other data), if it does not make sense to assign all free resources to 1 single workflow (e. g. if most...
> > * would it be possible to parallelize single processing steps automatically within given bounds (CPU, RAM) > > Yes, that's what the above proposal is about. If we...
> Especially processors with TensorFlow seem to use all available cores (but very inefficiently). That makes efficient parallelization on the book level currently difficult or even impossible. > > Ideally...