core icon indicating copy to clipboard operation
core copied to clipboard

workspace bagger: in-place behaves odd

Open bertsky opened this issue 5 years ago • 3 comments
trafficstars

I am trying to fix bagit checksums in OCR-D/assets#64 with ocrd zip bag -Z -I:

  • When I use -d data, then the bagger will move everything to data/data.
  • When I omit -d data (i.e. effectively use -d . in the bagit directory), then the bagger won't find the mets.xml.
  • When I use -m data/mets.xml this does not work either, because apparently the --mets option in this command (in stark contrast to all the other commands and the workspace processor CLI) does not denote the (absolute or CWD-relative) METS path on the input side, but the workspace-relative METS path on the output side.

So how do I use this tool at all?

bertsky avatar Dec 03 '19 11:12 bertsky

I will look into this, but in the meantime, if you want to update checksums, have a look at https://github.com/kba/ocrd-docs/blob/master/update-bagit It's what I use for assets

kba avatar Dec 04 '19 12:12 kba

Also not behaving in a useful way: without -I, whatever is specified as DEST does not get used directly as output directory. Instead, a new subdirectory with a randomized name gets used (silently).

bertsky avatar Jul 02 '20 17:07 bertsky

I will revisit this as soon as possible in the context of revamping our GT setup.

kba avatar Jul 07 '20 11:07 kba

Hi, I recently came across recreating checksums for bagits and wanted to know if ocrd can do it. This way I came across this issue. I hope you don't mind me dug this issue up again and that I understand it correctly.

My first question is what your goal is? Is it to recreate/fix the checksums with ocrd or is there a problem with the -I switch (if so i don't understand that)? Because I think -I does exactly what is stated in the description (even if that is pretty useless): The description says: "Replace workspace with bag (like bagit.py does)". The switch replaces/deletes a workspace (mets and filegroups, just a workspace not a bag) and puts the bagit-stuff: data-dir (containing the workspace), bag-info.txt, manifest-sha512.txt etc to where the workspace has been.

What I would want to do/change is this: use ocrd zip bag -I -d some/bag/dir foo/ and then the bag at some/bag/dir or rather the stuff in data/ and bag-info.txt is used to create a new bag which is put into foo/.

joschrew avatar Oct 27 '22 18:10 joschrew

My first question is what your goal is?

For bag -Z -I it is simply to update the bag checksums (say after some changes to the METS and files).

Is it to recreate/fix the checksums with ocrd or is there a problem with the -I switch (if so i don't understand that)?

For bag -Z there is a problem with the chosen output directory – DEST does not get used (see above).

For bag -I I don't know what I should expect. Replacing the directory with a zip does not make much sense to me (perhaps due to lack of imagination).

Because I think -I does exactly what is stated in the description (even if that is pretty useless): The description says: "Replace workspace with bag (like bagit.py does)". The switch replaces/deletes a workspace (mets and filegroups, just a workspace not a bag) and puts the bagit-stuff: data-dir (containing the workspace), bag-info.txt, manifest-sha512.txt etc to where the workspace has been.

I see. But how can you have a (zip) file where a directory was – without renaming it to *.zip?

bertsky avatar Nov 13 '22 15:11 bertsky

BTW, I believe the cause of the misbehaviour might just be the wrong order of the arguments:

https://github.com/OCR-D/core/blob/e841ce8443ec7e0fa90e99796b3a947be09844d9/ocrd/ocrd/cli/zip.py#L31-L48

Notice how directory and dest have been confused.

bertsky avatar Nov 13 '22 15:11 bertsky

Thanks for answering.

For bag -Z there is a problem with the chosen output directory – DEST does not get used (see above).

In my tests it worked as expected. (Although if DEST is not specified, it gets a strange default value). My Testcases:

  • workspace: I have /tmp/workspace-1, which contains mets.xml and filegroup-dirs with files
  • command: ocrd zip bag -Z -d /tmp/workspace-1 /tmp/workspace-1-output -i something
  • result: /tmp/workspace-1-output contains data/mets.xml, data/FILEGROUP and bag-files as expected.

Notice how directory and dest have been confused.

I don't see what you mean. I see dest is specified as a click-argument before directory but in the function declaration it is the other way around. Did you mean that? But when delegating to workspace-bagger the arguments are given correctly as kwargs so I don't see what could be wrong here and, as I said, it worked in my tests.

joschrew avatar Nov 14 '22 20:11 joschrew

From my point of view this can be closed because of #964 and #951. I think I wrongly used the fix keyword in the #951 this is why it didn't close automatically.

joschrew avatar Dec 16 '22 09:12 joschrew

Yes, this should be fixed in https://github.com/OCR-D/core/releases/tag/v2.44.0.

AFAICS the last remaining issue is

For bag -Z there is a problem with the chosen output directory – DEST does not get used (see above).

Which I cannot reproduce (anymore). For example, if I do

cd repo/assets/data/kant_aufklaerung_1784
ocrd zip bag -d data -Z -I foo

I get the OCRD-ZIP I expect, without nested data. So all seems good now. Thanks @joschrew for fixing this.

kba avatar Dec 19 '22 12:12 kba