ocrd_cis
ocrd_cis copied to clipboard
AssertionError from add_baseline(geom)
I have found an issue for a specific page but I am not sure what exactly the problem is other than that the page seems empty:
20:43:59.071 DEBUG ocrd.processor.helpers.run_processor - Running processor <class 'ocrd_cis.ocropy.segment.OcropySegment'>
20:43:59.071 DEBUG ocrd.processor.helpers.run_processor - Processor instance <ocrd_cis.ocropy.segment.OcropySegment object at 0x7f3d684b9220> (ocrd-cis-ocropy-segment v0.1.5 doing layout/segmentation/region)
20:43:59.072 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'mimetype': None, 'page_id': 'PHYS_0510', 'file_grp': 'OCR-D-CLIP'})
20:43:59.246 DEBUG ocrd.processor.base - adding file FILE_0510_OCR-D-CLIP for page PHYS_0510 to input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0000.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0001.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0004.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0007.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0008.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0009.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0010.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0011.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.workspace.download_file - 'local_filename' OCR-D-CLIP/FILE_0510_OCR-D-CLIP.xml already within /vd18_data/PPN831977752_513pages - nothing to do
20:43:59.249 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0510_DEFAULT.jpg'})
20:43:59.706 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0510_DEFAULT.jpg'})
20:44:00.485 DEBUG ocrd.workspace.image_from_page - page 'FILE_0510_OCR-D-CLIP' has border, orientation=0 skew=0.00
20:44:00.485 DEBUG ocrd.workspace.image_from_page - Using AlternativeImage 5 {'', 'deskewed', 'cropped', 'binarized', 'clipped', 'despeckled'} for page 'FILE_0510_OCR-D-CLIP'
20:44:00.485 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-SEG-BLOCK-TESSERACT/FILE_0510_OCR-D-SEG-BLOCK-TESSERACT.IMG-BIN.png'})
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-46 -49]
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-359. -618.5]
20:44:01.273 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.00° around [359. 618.5]
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [359. 618.5]
20:44:01.274 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:01.277 DEBUG ocrd.utils.crop_image - cropping image to (346, 828, 381, 863)
20:44:01.278 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-346 -828]
20:44:01.279 DEBUG ocrd.workspace.image_from_segment - segment 'region0000' has orientation=0 skew=0.01
20:44:01.279 DEBUG ocrd.workspace.image_from_segment - Using AlternativeImage 1 {'despeckled', 'clipped', 'binarized'} for segment 'region0000'
20:44:01.279 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-CLIP/FILE_0510_OCR-D-CLIP_region0000.IMG-CLIP.png'})
20:44:02.078 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-17.5 -17.5]
20:44:02.078 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.01° around [17.5 17.5]
20:44:02.079 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [17.50371747 17.50371747]
20:44:02.079 DEBUG ocrd.workspace.image_from_segment - Rotating AlternativeImage for segment 'region0000' by 0.01°
20:44:02.079 DEBUG ocrd.utils.rotate_image - rotating image by 0.01°
20:44:02.079 DEBUG ocrd.workspace.image_from_segment - Recropping AlternativeImage for segment 'region0000'
20:44:02.080 DEBUG ocrd.utils.crop_image - cropping image to (0, 0, 35, 35)
20:44:02.080 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:02.083 DEBUG ocrd.utils.crop_image - cropping image to (261, 435, 367, 845)
20:44:02.085 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-261 -435]
20:44:02.085 DEBUG ocrd.workspace.image_from_segment - segment 'region0001' has orientation=0 skew=0.01
20:44:02.085 DEBUG ocrd.workspace.image_from_segment - Using AlternativeImage 1 {'despeckled', 'clipped', 'binarized'} for segment 'region0001'
20:44:02.085 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-CLIP/FILE_0510_OCR-D-CLIP_region0001.IMG-CLIP.png'})
20:44:02.835 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [ -53. -205.]
20:44:02.836 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.01° around [ 53. 205.]
20:44:02.836 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [ 53.04355096 205.0112552 ]
20:44:02.836 DEBUG ocrd.workspace.image_from_segment - Rotating AlternativeImage for segment 'region0001' by 0.01°
20:44:02.836 DEBUG ocrd.utils.rotate_image - rotating image by 0.01°
20:44:02.837 DEBUG ocrd.workspace.image_from_segment - Recropping AlternativeImage for segment 'region0001'
20:44:02.839 DEBUG ocrd.utils.crop_image - cropping image to (0, 0, 106, 410)
20:44:02.839 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:03.055 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-cis-ocropy-segment'
Traceback (most recent call last):
File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd/processor/helpers.py", line 130, in run_processor
processor.process()
File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 500, in process
self._process_element(region, ignore, region_image, region_coords,
File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 788, in _process_element
line_polygons, _ = masks2polygons(line_labels, baselines, element_bin,
File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 234, in masks2polygons
base = join_baselines([baseline.intersection(polygon)
File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 959, in join_baselines
add_baseline(geom)
File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 951, in add_baseline
assert all(p1[0] < p2[0] for p1, p2 in zip(result[:-1], result[1:])), result
AssertionError: [(52.0, 277.0), (74.5, 279.5), (77.875, 279.875), (88.0, 281.0), (89.0, 275.0), (89.0, 281.0), (90.0, 275.0), (90.0, 281.0), (91.0, 274.0), (91.0, 282.0), (92.0, 274.0), (92.0, 282.0), (93.0, 274.0), (93.0, 282.0), (94.0, 273.0), (94.0, 282.0), (95.0, 273.0), (95.0, 282.0), (96.0, 273.0), (96.0, 283.0), (97.0, 273.0), (97.0, 283.0), (98.0, 273.0), (98.0, 283.0), (99.0, 273.0), (99.0, 283.0), (100.0, 272.0), (100.0, 283.0), (101.0, 272.0), (101.0, 283.0), (102.0, 272.0), (102.0, 283.0), (103.0, 272.0), (103.0, 283.0), (104.0, 272.0), (104.0, 283.0), (105.0, 272.0)]
The used workflow:
cis-ocropy-binarize -I DEFAULT -O OCR-D-BINPAGE -P dpi 300
anybaseocr-crop -I OCR-D-BINPAGE -O OCR-D-SEG-PAGE-ANYOCR -P dpi 300
cis-ocropy-denoise -I OCR-D-SEG-PAGE-ANYOCR -O OCR-D-DENOISE-OCROPY -P dpi 300
cis-ocropy-deskew -I OCR-D-DENOISE-OCROPY -O OCR-D-DESKEW-OCROPY -P level-of-operation page
tesserocr-segment-region -I OCR-D-DESKEW-OCROPY -O OCR-D-SEG-BLOCK-TESSERACT -P dpi 300 -P padding 5.0 -P find_tables false
segment-repair -I OCR-D-SEG-BLOCK-TESSERACT -O OCR-D-SEGMENT-REPAIR -P plausibilize true -P plausibilize_merge_min_overlap 0.7
cis-ocropy-clip -I OCR-D-SEGMENT-REPAIR -O OCR-D-CLIP
cis-ocropy-segment -I OCR-D-CLIP -O OCR-D-SEGMENT-OCROPY -P dpi 300
cis-ocropy-dewarp -I OCR-D-SEGMENT-OCROPY -O OCR-D-DEWARP
tesserocr-recognize -I OCR-D-DEWARP -O OCR-D-OCR -P model Fraktur
Here is the problematic image of page 510:
It is worth mentioning that other similar pages did not fail. E.g. pages 508 and 509:
File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 951, in add_baseline assert all(p1[0] < p2[0] for p1, p2 in zip(result[:-1], result[1:])), result AssertionError: [(52.0, 277.0), (74.5, 279.5), (77.875, 279.875), (88.0, 281.0), (89.0, 275.0), (89.0, 281.0), (90.0, 275.0), (90.0, 281.0), (91.0, 274.0), (91.0, 282.0), (92.0, 274.0), (92.0, 282.0), (93.0, 274.0), (93.0, 282.0), (94.0, 273.0), (94.0, 282.0), (95.0, 273.0), (95.0, 282.0), (96.0, 273.0), (96.0, 283.0), (97.0, 273.0), (97.0, 283.0), (98.0, 273.0), (98.0, 283.0), (99.0, 273.0), (99.0, 283.0), (100.0, 272.0), (100.0, 283.0), (101.0, 272.0), (101.0, 283.0), (102.0, 272.0), (102.0, 283.0), (103.0, 272.0), (103.0, 283.0), (104.0, 272.0), (104.0, 283.0), (105.0, 272.0)]
Thanks @MehmedGIT for the detailled report!
What this means is that while trying to join baseline segments for the line, ordering them by their x coordinate, the sequence did not turn out strictly monotonous:
Obviously, we are trying to extract a single baseline from two neighbouring lines here.
I'll try to reproduce and see what I can do.
Note: I cannot reproduce with current head of https://github.com/bertsky/ocrd_cis/tree/fix-alpha-shape and ocrd_tesserocr (based on Tesseract 5.3.4) and ocrd_segment. The workflow runs through – here is a screenshot from OCRD Browser:
Could you please try to update said modules and try again?
Could you please try to update said modules and try again?
I will. Thanks for having a look.