docling
docling copied to clipboard
Error with blank png: Make sure that the channel dimension of the pixel values match with the one set in the configuration.
Bug
I have an image without text, it fails docling conversion. This happens both on mac and nvidia GPU.
https://github.com/user-attachments/assets/1518021f-70c7-4a98-8994-09b1b305e3e0 Note: its named svg but its a PNG image
Steps to reproduce
uvx docling==2.31.0 vec.svg --pdf-backend dlparse_v4 --to md
The channel dimension is ambiguous. Got image shape (1, 1, 3). Assuming channels are the first dimension.
WARNING:docling.pipeline.base_pipeline:Encountered an error during conversion of document 90a2134105ce90eb548541bc22129b7d2766d7a83877d56622c345d73fa6863e:
Traceback (most recent call last):
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 160, in _build_document
for p in pipeline_pages: # Must exhaust!
^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 126, in _apply_on_pages
yield from page_batch
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/page_assemble_model.py", line 69, in __call__
for page in page_batch:
^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/table_structure_model.py", line 181, in __call__
for page in page_batch:
^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/layout_model.py", line 157, in __call__
for ix, pred_item in enumerate(
^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
response = gen.send(None)
^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling_ibm_models/layoutmodel/layout_predictor.py", line 143, in predict
outputs = self._model(**inputs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 2003, in forward
outputs = self.model(
^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 1719, in forward
features = self.backbone(pixel_values, pixel_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 535, in forward
features = self.model(pixel_values).feature_maps
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr_resnet.py", line 413, in forward
embedding_output = self.embedder(pixel_values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr_resnet.py", line 108, in forward
raise ValueError(
ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration.
Docling version
Docling version: 2.31.0
Docling Core version: 2.28.1
Docling IBM Models version: 3.4.2
Docling Parse version: 4.0.1
Python: cpython-312 (3.12.7)
Platform: macOS-15.3.1-arm64-arm-64bit
Python version
3.12
@Fogapod I agree we should safeguard the docling code against this edge case. Still, this appears to be a 1x1 pixel PNG. Do you see this problem also on a real world document?
No, this is the only file I've found triggering this