core
core copied to clipboard
Set mets:agent for bashlib processors
Indeed.
In Pythonic core, we use workspace.mets.add_agent during Processor.run_processor.
In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done. Or we already include it in ocrd__wrap.
In Bashlib, we could add a new subcommand
ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported functionocrd__add_agentin lib.bash to be used by processors when done.
Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.
Or we already include it in
ocrd__wrap.
That would mean that the agent is added before any processing takes place, whereas in run_processor we only add the agent if the processing succeeds. So I think there's no way around bashlib processors adding the agent themselves as the last step of the script.
Exactly, but I would prefer an
ocrd workspace add-agentsubcommand for consistency.
To be consistent with what exactly?
In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.
Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.
Exactly, but I would prefer an
ocrd workspace add-agentsubcommand for consistency.To be consistent with what exactly?
I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.
In
ocrd bashlibwe already haveinput-files [CLI-params]. Consistency (to me) would mean that we should addadd-agent [CLI-params]there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.
Also true, there probably won't be a need to do ocrd workspace add-agent beyond bashlib, so I'm fine with either.
Adding a general purpose
add-agenttoocrd workspacewould mean one still needs to translate the CLI parameters intomets:agentandmets:name/mets:noteparameters somehow via shell in every processor.
Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.
I mean consistent with e.g.
ocrd workspace set-id, i.e. have all the METS metadata functionality inocrd workspace.
ok, got it. But then we should also have get-agent etc.
So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.
Yes, no way around that, though we can wrap that in a
ocrd__add_agentfunction which relies on the the otherocrd__variables.
Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.
We could also do both. So
- offer a bare-bones
ocrd workspace add-agent - offer a
ocrd bashlib add-agent [CLI-params]
I mean consistent with e.g.
ocrd workspace set-id, i.e. have all the METS metadata functionality inocrd workspace.ok, got it. But then we should also have
get-agentetc.So the
ocrd workspace add-agentwould be very hard to use in itself, but at least we could say the CLI is complete.
It's probably not a good investment of effort to offer generic CLI getters/setters for something we only need for bashlib (@maxnth raised this question, hence this issue). So, I'm good with just ocrd bashlib add-agent.
Yes, no way around that, though we can wrap that in a
ocrd__add_agentfunction which relies on the the otherocrd__variables.Indeed. But doing it in Python (i.e.
ocrd bashlibinstead oflib.bash) is still easier.
Agreed, so we'd have a ocrd bashlib add-agent subcommand that accepts options for --executable, --other-role and the usual CLI arguments (-I, -O, -g, -P etc.) and adds a mets:agent just like at the end of run_processor.
Considering that most times, I use processors with ocrd process instead of directly, and ocrd_network also relies on it, we could also instead add the agent at the end of run_cli. We should not do both, obviously, and the CLI should work self-contained, so that's probably not a real solution.
What about processingStep PAGE-XML metadata? Should we also add an option for --page-xml, so that is also consistent with what is Processor.add_metadata in the v3 API