core icon indicating copy to clipboard operation
core copied to clipboard

parsing and resolving resource parameters in ocrd_network

Open bertsky opened this issue 10 months ago • 1 comments

Another problem is with how parameter resources get resolved now. In the CLI setting, we allowed an ambiguity between JSON literals and file names, so the latter would have to be resolved to absolute paths (for the various resource locations), which in turn obviously dependend on the processor class (because of the custom module location):

https://github.com/OCR-D/core/blob/4d33491ab4e515a60e240cbab2b1510b4e58fa02/src/ocrd/decorators/init.py#L95-L103

But in the network setting, where parameters can be added to the processing or workflow request, no such resolution (from resource name to resource path) takes place –

  • On the server side, there is only a parameter validation step
    https://github.com/OCR-D/core/blob/4d33491ab4e515a60e240cbab2b1510b4e58fa02/src/ocrd_network/server_utils.py#L220
  • On the client side, there is only the JSON string vs file disambiguation, but no prior resolution.
    https://github.com/OCR-D/core/blob/4d33491ab4e515a60e240cbab2b1510b4e58fa02/src/ocrd_network/cli/client.py#L142

Did we abandon the entire mechanism, or am I missing something?

Originally posted by @bertsky in https://github.com/OCR-D/core/issues/1303#issuecomment-2639870376

bertsky avatar Feb 17 '25 17:02 bertsky

So what does processor.resolve_resource(name) actually do?

Well, …

https://github.com/OCR-D/core/blob/47b77aa4249a93ebf9b279f43bb841c192facc20/src/ocrd/processor/base.py#L937-L938

… thus it uses ocrd_utils.list_resource_candidates() with the processor's self.moduledir. The latter (being installation-dependent) is only known to the processor's codebase, so it requires processor instantiation. That's why outside the Processor class, we have ocrd_utils.get_moduledir() for this, which tries to read the install-time ocrd-all-module-dir.json file (which we deploy in both in ocrd/all fat images and in all continuosly deployed thin/per-module Docker images), and only then falls back on a CLI instantiation (--dump-module-dir).

So what does that mean for ocrd_network?

Well, since even ocrd-all-module-dir.json is installation-dependent (and thus also processor-dependent), we need some runtime component (installed) on the processor side which can answer our list_resource_candidates() in a similar way. And that would have to be the new Resource Manager Server!

Which already uses the local ResourceManager's new build_resource_dest_dir() for the download case, so it could easily be extended to use the same for the resolve case.

Which BTW is not the same as the list_installed() case, unless we make the latter return only relative paths in the network setting. Currently, it is wrongly (!) implemented via list_available():

https://github.com/OCR-D/core/blob/47b77aa4249a93ebf9b279f43bb841c192facc20/src/ocrd_network/resource_manager_server.py#L102-L107

Regardless how we do that concretely, once we resolve parameter values as potential relative path names via RMS requests, we can then (and only then) correctly validate the parameters for the processing or workflow request. IMO this does not need to be on the client side though. And it does not at all make sense to translate the JSON literal vs. path name ambiguity to the network clients as well: this was only a convenience feature for the CLI user. In contrast, in ocrd_network, we should be clear as to what is a JSON literal, what is a parameter file to parse on the client side, and what is a path name to resolve on the worker side (but also pre-resolve on the server side for early validation).

bertsky avatar Dec 15 '25 12:12 bertsky