core icon indicating copy to clipboard operation
core copied to clipboard

Set mets:agent for bashlib processors

Open kba opened this issue 1 year ago • 6 comments

kba avatar Jul 29 '24 12:07 kba

Indeed.

In Pythonic core, we use workspace.mets.add_agent during Processor.run_processor.

In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done. Or we already include it in ocrd__wrap.

bertsky avatar Jul 31 '24 21:07 bertsky

In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done.

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

Or we already include it in ocrd__wrap.

That would mean that the agent is added before any processing takes place, whereas in run_processor we only add the agent if the processing succeeds. So I think there's no way around bashlib processors adding the agent themselves as the last step of the script.

kba avatar Aug 06 '24 10:08 kba

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

To be consistent with what exactly?

In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.

Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.

bertsky avatar Aug 06 '24 12:08 bertsky

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

To be consistent with what exactly?

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.

Also true, there probably won't be a need to do ocrd workspace add-agent beyond bashlib, so I'm fine with either.

Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

kba avatar Aug 11 '24 11:08 kba

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

ok, got it. But then we should also have get-agent etc.

So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.

We could also do both. So

  • offer a bare-bones ocrd workspace add-agent
  • offer a ocrd bashlib add-agent [CLI-params]

bertsky avatar Aug 11 '24 19:08 bertsky

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

ok, got it. But then we should also have get-agent etc.

So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.

It's probably not a good investment of effort to offer generic CLI getters/setters for something we only need for bashlib (@maxnth raised this question, hence this issue). So, I'm good with just ocrd bashlib add-agent.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.

Agreed, so we'd have a ocrd bashlib add-agent subcommand that accepts options for --executable, --other-role and the usual CLI arguments (-I, -O, -g, -P etc.) and adds a mets:agent just like at the end of run_processor.

Considering that most times, I use processors with ocrd process instead of directly, and ocrd_network also relies on it, we could also instead add the agent at the end of run_cli. We should not do both, obviously, and the CLI should work self-contained, so that's probably not a real solution.

What about processingStep PAGE-XML metadata? Should we also add an option for --page-xml, so that is also consistent with what is Processor.add_metadata in the v3 API

kba avatar Aug 12 '24 10:08 kba