docling icon indicating copy to clipboard operation
docling copied to clipboard

Error with blank png: Make sure that the channel dimension of the pixel values match with the one set in the configuration.

Open Fogapod opened this issue 7 months ago • 2 comments

Bug

I have an image without text, it fails docling conversion. This happens both on mac and nvidia GPU.

https://github.com/user-attachments/assets/1518021f-70c7-4a98-8994-09b1b305e3e0 Note: its named svg but its a PNG image

Steps to reproduce

uvx docling==2.31.0 vec.svg --pdf-backend dlparse_v4 --to md

The channel dimension is ambiguous. Got image shape (1, 1, 3). Assuming channels are the first dimension.
WARNING:docling.pipeline.base_pipeline:Encountered an error during conversion of document 90a2134105ce90eb548541bc22129b7d2766d7a83877d56622c345d73fa6863e:
Traceback (most recent call last):

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 160, in _build_document
    for p in pipeline_pages:  # Must exhaust!
             ^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 126, in _apply_on_pages
    yield from page_batch

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/page_assemble_model.py", line 69, in __call__
    for page in page_batch:
                ^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/table_structure_model.py", line 181, in __call__
    for page in page_batch:
                ^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/layout_model.py", line 157, in __call__
    for ix, pred_item in enumerate(
                         ^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling_ibm_models/layoutmodel/layout_predictor.py", line 143, in predict
    outputs = self._model(**inputs)
              ^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 2003, in forward
    outputs = self.model(
              ^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 1719, in forward
    features = self.backbone(pixel_values, pixel_mask)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 535, in forward
    features = self.model(pixel_values).feature_maps
               ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr_resnet.py", line 413, in forward
    embedding_output = self.embedder(pixel_values)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr_resnet.py", line 108, in forward
    raise ValueError(

ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration.

Docling version

Docling version: 2.31.0
Docling Core version: 2.28.1
Docling IBM Models version: 3.4.2
Docling Parse version: 4.0.1
Python: cpython-312 (3.12.7)
Platform: macOS-15.3.1-arm64-arm-64bit

Python version

3.12

Fogapod avatar Apr 30 '25 16:04 Fogapod

@Fogapod I agree we should safeguard the docling code against this edge case. Still, this appears to be a 1x1 pixel PNG. Do you see this problem also on a real world document?

cau-git avatar May 21 '25 13:05 cau-git

No, this is the only file I've found triggering this

Fogapod avatar May 21 '25 16:05 Fogapod