Could not load the custom kernel for multi-scale deformable attention.
Hey there,
Actually when running the code that previously went perfect and zero problems, now I got the error message:
Could not load the custom kernel for multi-scale deformable attention: /home/randbee/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
Anyone experience the same thing?
Seems like the installation of docling is missing some file related to pytorch.
I already tried to:
pip install docling --extra-index-url https://download.pytorch.org/whl/cpu
As I am running the process on a CPU.
I have the same issue while running docling in any environment. Weirdly enough the error only pops up at the first usage of doclings convert-function and then nevers shows up again for the same docling instance. It also seems to prevent it to run on GPU (however as jmvial said also pops up when I tell docling to only use cpu)
Potential duplicate of #671
Had the same issue there. Seems like the issue is with newer torch==2.6 and torchvision==0.21.0 that comes with it. I downgraded them to torch==2.5.1 torchvision==0.20.1. No issues so far. Exact version combination is taken from https://pytorch.org/get-started/previous-versions/
Extra information sources:
- https://github.com/huggingface/transformers/issues/35349
- https://github.com/huggingface/transformers/pull/35979
Pinning torch and torchvision worked for me as well.
Hello @sadaisystems and @TroyWilliams3687,
What version of Docling are you using? I tried pinning the versions to torch==2.5.1 and torchvision==0.20.1 with the latest versions (2.24.0, 2.25.0) and the issue persists.
@jmmfcoutinho I ended up moving away from tessearct as I couldn't get it running properly on windows 11. I use the EasyOCR and I don't have the issue. I don't have them pinned anymore.
I was using v2.20 and v2.21 of docling before I made the change.
This is the setup to replicate the error. I'm working on Windows with WSL2. Ubuntu 24.04.1 LTS Python 3.12.3
Repo structure
error
├── README.md
├── error.py
└── requirements.txt
error.py
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
# output: ## Docling Technical Report [...]"
requirements.txt
torch==2.5.1
torchvision==0.20.1
docling==2.25.0
Commands before running the script
python3.12 -m venv .venv
source .venv/bin/activate
pip install --no-cache-dir -r requirements.txt
Script running
# Run script
python error.py
Error logs
Could not load the custom kernel for multi-scale deformable attention: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Could not load the custom kernel for multi-scale deformable attention: /home/jmmfcoutinho/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
Could not load the custom kernel for multi-scale deformable attention: /home/jmmfcoutinho/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
Could not load the custom kernel for multi-scale deformable attention: /home/jmmfcoutinho/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
Could not load the custom kernel for multi-scale deformable attention: /home/jmmfcoutinho/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
Could not load the custom kernel for multi-scale deformable attention: /home/jmmfcoutinho/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
Output from docling conversions
<!-- image -->
## Docling Technical Report
Version 1.0
Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar
AI4K Group, IBM Research R¨ uschlikon, Switzerland
...
# the output is truncated
pip freeze
annotated-types==0.7.0
attrs==25.1.0
beautifulsoup4==4.13.3
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
dill==0.3.9
docling==2.25.0
docling-core==2.20.0
docling-ibm-models==3.4.0
docling-parse==3.4.0
easyocr==1.7.2
et_xmlfile==2.0.0
filelock==3.17.0
filetype==1.2.0
fsspec==2025.2.0
huggingface-hub==0.29.1
idna==3.10
imageio==2.37.0
Jinja2==3.1.5
jsonlines==3.1.0
jsonref==1.1.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
latex2mathml==3.77.0
lazy_loader==0.4
lxml==5.3.1
markdown-it-py==3.0.0
marko==2.1.2
MarkupSafe==3.0.2
mdurl==0.1.2
mpire==2.10.2
mpmath==1.3.0
multiprocess==0.70.17
networkx==3.4.2
ninja==1.11.1.3
numpy==2.2.3
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
opencv-python-headless==4.11.0.86
openpyxl==3.1.5
packaging==24.2
pandas==2.2.3
pillow==11.1.0
pyclipper==1.3.0.post6
pydantic==2.10.6
pydantic-settings==2.8.0
pydantic_core==2.27.2
Pygments==2.19.1
pypdfium2==4.30.1
python-bidi==0.6.6
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-dotenv==1.0.1
python-pptx==1.0.2
pytz==2025.1
PyYAML==6.0.2
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.23.1
Rtree==1.3.0
safetensors==0.5.3
scikit-image==0.25.2
scipy==1.15.2
semchunk==2.2.2
setuptools==75.8.1
shapely==2.0.7
shellingham==1.5.4
six==1.17.0
soupsieve==2.6
sympy==1.13.1
tabulate==0.9.0
tifffile==2025.2.18
tokenizers==0.21.0
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.49.0
triton==3.1.0
typer==0.12.5
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
XlsxWriter==3.2.2
Notes
Is seems that even with fixed torch and torchvision versions, the error persists.
Nevertheless, there seems to be no problem with the output, but I can't say for sure.
I have tested with different docling versions and the problem starts on v2.12.0 when support for GPU Accelerators is introduced.
@jmmfcoutinho I ended up moving away from tessearct as I couldn't get it running properly on windows 11. I use the EasyOCR and I don't have the issue. I don't have them pinned anymore.
I was using v2.20 and v2.21 of docling before I made the change.
@TroyWilliams3687 thanks for the reply! But unfortunately, as you can see in my example, I'm using the most simplest example possible, with all defaults (EasyOCR by default)