core
core copied to clipboard
workspace bagger: in-place behaves odd
I am trying to fix bagit checksums in OCR-D/assets#64 with ocrd zip bag -Z -I:
- When I use
-d data, then the bagger will move everything todata/data. - When I omit
-d data(i.e. effectively use-d .in the bagit directory), then the bagger won't find the mets.xml. - When I use
-m data/mets.xmlthis does not work either, because apparently the--metsoption in this command (in stark contrast to all the other commands and the workspace processor CLI) does not denote the (absolute or CWD-relative) METS path on the input side, but the workspace-relative METS path on the output side.
So how do I use this tool at all?
I will look into this, but in the meantime, if you want to update checksums, have a look at https://github.com/kba/ocrd-docs/blob/master/update-bagit It's what I use for assets
Also not behaving in a useful way: without -I, whatever is specified as DEST does not get used directly as output directory. Instead, a new subdirectory with a randomized name gets used (silently).
I will revisit this as soon as possible in the context of revamping our GT setup.
Hi, I recently came across recreating checksums for bagits and wanted to know if ocrd can do it. This way I came across this issue. I hope you don't mind me dug this issue up again and that I understand it correctly.
My first question is what your goal is? Is it to recreate/fix the checksums with ocrd or is there a problem with the -I switch (if so i don't understand that)? Because I think -I does exactly what is stated in the description (even if that is pretty useless): The description says: "Replace workspace with bag (like bagit.py does)". The switch replaces/deletes a workspace (mets and filegroups, just a workspace not a bag) and puts the bagit-stuff: data-dir (containing the workspace), bag-info.txt, manifest-sha512.txt etc to where the workspace has been.
What I would want to do/change is this: use ocrd zip bag -I -d some/bag/dir foo/ and then the bag at some/bag/dir or rather the stuff in data/ and bag-info.txt is used to create a new bag which is put into foo/.
My first question is what your goal is?
For bag -Z -I it is simply to update the bag checksums (say after some changes to the METS and files).
Is it to recreate/fix the checksums with ocrd or is there a problem with the
-Iswitch (if so i don't understand that)?
For bag -Z there is a problem with the chosen output directory – DEST does not get used (see above).
For bag -I I don't know what I should expect. Replacing the directory with a zip does not make much sense to me (perhaps due to lack of imagination).
Because I think
-Idoes exactly what is stated in the description (even if that is pretty useless): The description says: "Replace workspace with bag (like bagit.py does)". The switch replaces/deletes a workspace (mets and filegroups, just a workspace not a bag) and puts the bagit-stuff: data-dir (containing the workspace), bag-info.txt, manifest-sha512.txt etc to where the workspace has been.
I see. But how can you have a (zip) file where a directory was – without renaming it to *.zip?
BTW, I believe the cause of the misbehaviour might just be the wrong order of the arguments:
https://github.com/OCR-D/core/blob/e841ce8443ec7e0fa90e99796b3a947be09844d9/ocrd/ocrd/cli/zip.py#L31-L48
Notice how directory and dest have been confused.
Thanks for answering.
For
bag -Zthere is a problem with the chosen output directory –DESTdoes not get used (see above).
In my tests it worked as expected. (Although if DEST is not specified, it gets a strange default value).
My Testcases:
- workspace: I have
/tmp/workspace-1, which contains mets.xml and filegroup-dirs with files - command:
ocrd zip bag -Z -d /tmp/workspace-1 /tmp/workspace-1-output -i something - result:
/tmp/workspace-1-outputcontains data/mets.xml, data/FILEGROUP and bag-files as expected.
Notice how
directoryanddesthave been confused.
I don't see what you mean. I see dest is specified as a click-argument before directory but in the function declaration it is the other way around. Did you mean that? But when delegating to workspace-bagger the arguments are given correctly as kwargs so I don't see what could be wrong here and, as I said, it worked in my tests.
From my point of view this can be closed because of #964 and #951. I think I wrongly used the fix keyword in the #951 this is why it didn't close automatically.
Yes, this should be fixed in https://github.com/OCR-D/core/releases/tag/v2.44.0.
AFAICS the last remaining issue is
For
bag -Zthere is a problem with the chosen output directory –DESTdoes not get used (see above).
Which I cannot reproduce (anymore). For example, if I do
cd repo/assets/data/kant_aufklaerung_1784
ocrd zip bag -d data -Z -I foo
I get the OCRD-ZIP I expect, without nested data. So all seems good now. Thanks @joschrew for fixing this.