archive-hocr-tools
archive-hocr-tools copied to clipboard
hocr-to-epub: require hocr_xml_file_path to end with _hocr.html
quickfix to avoid broken file paths
require hocr_xml_file_path to end with _hocr.html
the user can bypass this requirement by setting all file paths
before this patch
ImageStack.parse_stack tried to use the hocr.html file as a jp2.zip file
$ hocr-to-epub -f 001.hocr -o 001.epub
Traceback (most recent call last):
File "/nix/store/km6ybsgiig9bnrw2n5csw8ivasamsn90-archive-hocr-tools-1.1.67/bin/.hocr-to-epub-wrapped", line 750, in <module>
EpubGenerator(
~~~~~~~~~~~~~^
args.infile,
^^^^^^^^^^^^
...<4 lines>...
use_kakadu=args.kakadu,
^^^^^^^^^^^^^^^^^^^^^^^
ignore_broken_images=args.ignore_broken_images)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/km6ybsgiig9bnrw2n5csw8ivasamsn90-archive-hocr-tools-1.1.67/bin/.hocr-to-epub-wrapped", line 327, in __init__
self.img_stack = ImageStack(
~~~~~~~~~~^
self.image_stack_zip_file_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
os.path.join(WORKING_DIR,"epub_img"),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
use_kakadu=use_kakadu,
^^^^^^^^^^^^^^^^^^^^^^
ignore_broken_images=ignore_broken_images)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/km6ybsgiig9bnrw2n5csw8ivasamsn90-archive-hocr-tools-1.1.67/bin/.hocr-to-epub-wrapped", line 82, in __init__
self.parse_stack()
~~~~~~~~~~~~~~~~^^
File "/nix/store/km6ybsgiig9bnrw2n5csw8ivasamsn90-archive-hocr-tools-1.1.67/bin/.hocr-to-epub-wrapped", line 99, in parse_stack
self.zf = tarfile.open(self.image_archive_file_path)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/lib/python3.13/tarfile.py", line 1882, in open
raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}")
tarfile.ReadError: file could not be opened successfully:
- method gz: ReadError('not a gzip file')
- method bz2: ReadError('not a bzip2 file')
- method xz: ReadError('not an lzma file')
- method tar: ReadError('invalid header')
after this patch, it fails early
$ hocr-to-epub -f 001.hocr -o 001.epub
Traceback (most recent call last):
File "/nix/store/km6ybsgiig9bnrw2n5csw8ivasamsn90-archive-hocr-tools-1.1.67/bin/.hocr-to-epub-wrapped", line 750, in <module>
EpubGenerator(
~~~~~~~~~~~~~^
args.infile,
^^^^^^^^^^^^
...<4 lines>...
use_kakadu=args.kakadu,
^^^^^^^^^^^^^^^^^^^^^^^
ignore_broken_images=args.ignore_broken_images)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/km6ybsgiig9bnrw2n5csw8ivasamsn90-archive-hocr-tools-1.1.67/bin/.hocr-to-epub-wrapped", line 307, in __init__
assert self.hocr_xml_file_path.endswith('_hocr.html')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
AssertionError