ocrd_cis icon indicating copy to clipboard operation
ocrd_cis copied to clipboard

AssertionError from add_baseline(geom)

Open MehmedGIT opened this issue 1 year ago • 3 comments
trafficstars

I have found an issue for a specific page but I am not sure what exactly the problem is other than that the page seems empty:

20:43:59.071 DEBUG ocrd.processor.helpers.run_processor - Running processor <class 'ocrd_cis.ocropy.segment.OcropySegment'>
20:43:59.071 DEBUG ocrd.processor.helpers.run_processor - Processor instance <ocrd_cis.ocropy.segment.OcropySegment object at 0x7f3d684b9220> (ocrd-cis-ocropy-segment v0.1.5 doing layout/segmentation/region)
20:43:59.072 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'mimetype': None, 'page_id': 'PHYS_0510', 'file_grp': 'OCR-D-CLIP'})
20:43:59.246 DEBUG ocrd.processor.base - adding file FILE_0510_OCR-D-CLIP for page PHYS_0510 to input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0000.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0001.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0004.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0007.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0008.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0009.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0010.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0011.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.workspace.download_file - 'local_filename' OCR-D-CLIP/FILE_0510_OCR-D-CLIP.xml already within /vd18_data/PPN831977752_513pages - nothing to do
20:43:59.249 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0510_DEFAULT.jpg'})
20:43:59.706 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0510_DEFAULT.jpg'})
20:44:00.485 DEBUG ocrd.workspace.image_from_page - page 'FILE_0510_OCR-D-CLIP' has border, orientation=0 skew=0.00
20:44:00.485 DEBUG ocrd.workspace.image_from_page - Using AlternativeImage 5 {'', 'deskewed', 'cropped', 'binarized', 'clipped', 'despeckled'} for page 'FILE_0510_OCR-D-CLIP'
20:44:00.485 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-SEG-BLOCK-TESSERACT/FILE_0510_OCR-D-SEG-BLOCK-TESSERACT.IMG-BIN.png'})
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-46 -49]
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-359.  -618.5]
20:44:01.273 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.00° around [359.  618.5]
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [359.  618.5]
20:44:01.274 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:01.277 DEBUG ocrd.utils.crop_image - cropping image to (346, 828, 381, 863)
20:44:01.278 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-346 -828]
20:44:01.279 DEBUG ocrd.workspace.image_from_segment - segment 'region0000' has orientation=0 skew=0.01
20:44:01.279 DEBUG ocrd.workspace.image_from_segment - Using AlternativeImage 1 {'despeckled', 'clipped', 'binarized'} for segment 'region0000'
20:44:01.279 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-CLIP/FILE_0510_OCR-D-CLIP_region0000.IMG-CLIP.png'})
20:44:02.078 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-17.5 -17.5]
20:44:02.078 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.01° around [17.5 17.5]
20:44:02.079 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [17.50371747 17.50371747]
20:44:02.079 DEBUG ocrd.workspace.image_from_segment - Rotating AlternativeImage for segment 'region0000' by 0.01°
20:44:02.079 DEBUG ocrd.utils.rotate_image - rotating image by 0.01°
20:44:02.079 DEBUG ocrd.workspace.image_from_segment - Recropping AlternativeImage for segment 'region0000'
20:44:02.080 DEBUG ocrd.utils.crop_image - cropping image to (0, 0, 35, 35)
20:44:02.080 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:02.083 DEBUG ocrd.utils.crop_image - cropping image to (261, 435, 367, 845)
20:44:02.085 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-261 -435]
20:44:02.085 DEBUG ocrd.workspace.image_from_segment - segment 'region0001' has orientation=0 skew=0.01
20:44:02.085 DEBUG ocrd.workspace.image_from_segment - Using AlternativeImage 1 {'despeckled', 'clipped', 'binarized'} for segment 'region0001'
20:44:02.085 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-CLIP/FILE_0510_OCR-D-CLIP_region0001.IMG-CLIP.png'})
20:44:02.835 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [ -53. -205.]
20:44:02.836 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.01° around [ 53. 205.]
20:44:02.836 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [ 53.04355096 205.0112552 ]
20:44:02.836 DEBUG ocrd.workspace.image_from_segment - Rotating AlternativeImage for segment 'region0001' by 0.01°
20:44:02.836 DEBUG ocrd.utils.rotate_image - rotating image by 0.01°
20:44:02.837 DEBUG ocrd.workspace.image_from_segment - Recropping AlternativeImage for segment 'region0001'
20:44:02.839 DEBUG ocrd.utils.crop_image - cropping image to (0, 0, 106, 410)
20:44:02.839 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:03.055 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-cis-ocropy-segment'
Traceback (most recent call last):
  File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd/processor/helpers.py", line 130, in run_processor
    processor.process()
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 500, in process
    self._process_element(region, ignore, region_image, region_coords,
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 788, in _process_element
    line_polygons, _ = masks2polygons(line_labels, baselines, element_bin,
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 234, in masks2polygons
    base = join_baselines([baseline.intersection(polygon)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 959, in join_baselines
    add_baseline(geom)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 951, in add_baseline
    assert all(p1[0] < p2[0] for p1, p2 in zip(result[:-1], result[1:])), result
AssertionError: [(52.0, 277.0), (74.5, 279.5), (77.875, 279.875), (88.0, 281.0), (89.0, 275.0), (89.0, 281.0), (90.0, 275.0), (90.0, 281.0), (91.0, 274.0), (91.0, 282.0), (92.0, 274.0), (92.0, 282.0), (93.0, 274.0), (93.0, 282.0), (94.0, 273.0), (94.0, 282.0), (95.0, 273.0), (95.0, 282.0), (96.0, 273.0), (96.0, 283.0), (97.0, 273.0), (97.0, 283.0), (98.0, 273.0), (98.0, 283.0), (99.0, 273.0), (99.0, 283.0), (100.0, 272.0), (100.0, 283.0), (101.0, 272.0), (101.0, 283.0), (102.0, 272.0), (102.0, 283.0), (103.0, 272.0), (103.0, 283.0), (104.0, 272.0), (104.0, 283.0), (105.0, 272.0)]

The used workflow:

cis-ocropy-binarize      -I DEFAULT                   -O OCR-D-BINPAGE             -P dpi 300
anybaseocr-crop          -I OCR-D-BINPAGE             -O OCR-D-SEG-PAGE-ANYOCR     -P dpi 300
cis-ocropy-denoise       -I OCR-D-SEG-PAGE-ANYOCR     -O OCR-D-DENOISE-OCROPY      -P dpi 300
cis-ocropy-deskew        -I OCR-D-DENOISE-OCROPY      -O OCR-D-DESKEW-OCROPY       -P level-of-operation page
tesserocr-segment-region -I OCR-D-DESKEW-OCROPY       -O OCR-D-SEG-BLOCK-TESSERACT -P dpi 300 -P padding 5.0  -P find_tables false
segment-repair           -I OCR-D-SEG-BLOCK-TESSERACT -O OCR-D-SEGMENT-REPAIR      -P plausibilize true       -P plausibilize_merge_min_overlap 0.7
cis-ocropy-clip          -I OCR-D-SEGMENT-REPAIR      -O OCR-D-CLIP
cis-ocropy-segment       -I OCR-D-CLIP                -O OCR-D-SEGMENT-OCROPY      -P dpi 300
cis-ocropy-dewarp        -I OCR-D-SEGMENT-OCROPY      -O OCR-D-DEWARP
tesserocr-recognize      -I OCR-D-DEWARP              -O OCR-D-OCR                 -P model Fraktur

Here is the problematic image of page 510: FILE_0510_DEFAULT

It is worth mentioning that other similar pages did not fail. E.g. pages 508 and 509:

FILE_0508_DEFAULT FILE_0509_DEFAULT

MehmedGIT avatar May 22 '24 11:05 MehmedGIT

  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 951, in add_baseline
    assert all(p1[0] < p2[0] for p1, p2 in zip(result[:-1], result[1:])), result
AssertionError: [(52.0, 277.0), (74.5, 279.5), (77.875, 279.875), (88.0, 281.0), (89.0, 275.0), (89.0, 281.0), (90.0, 275.0), (90.0, 281.0), (91.0, 274.0), (91.0, 282.0), (92.0, 274.0), (92.0, 282.0), (93.0, 274.0), (93.0, 282.0), (94.0, 273.0), (94.0, 282.0), (95.0, 273.0), (95.0, 282.0), (96.0, 273.0), (96.0, 283.0), (97.0, 273.0), (97.0, 283.0), (98.0, 273.0), (98.0, 283.0), (99.0, 273.0), (99.0, 283.0), (100.0, 272.0), (100.0, 283.0), (101.0, 272.0), (101.0, 283.0), (102.0, 272.0), (102.0, 283.0), (103.0, 272.0), (103.0, 283.0), (104.0, 272.0), (104.0, 283.0), (105.0, 272.0)]

Thanks @MehmedGIT for the detailled report!

What this means is that while trying to join baseline segments for the line, ordering them by their x coordinate, the sequence did not turn out strictly monotonous:

baseline-points

Obviously, we are trying to extract a single baseline from two neighbouring lines here.

I'll try to reproduce and see what I can do.

bertsky avatar May 24 '24 16:05 bertsky

Note: I cannot reproduce with current head of https://github.com/bertsky/ocrd_cis/tree/fix-alpha-shape and ocrd_tesserocr (based on Tesseract 5.3.4) and ocrd_segment. The workflow runs through – here is a screenshot from OCRD Browser:

OCR-D-OCR

Could you please try to update said modules and try again?

bertsky avatar May 29 '24 23:05 bertsky

Could you please try to update said modules and try again?

I will. Thanks for having a look.

MehmedGIT avatar May 30 '24 07:05 MehmedGIT