When I run the example code, nothing happens—not even an error.
Bug
First, I want to apologize for my English; I’m using a translator.
I’m running the base code provided in the documentation, but nothing happens.
I ran it in Python (version 3.12.2) and via CLI; in both cases, no errors are returned.
Steps to reproduce
Here’s my code:
from docling.document_converter import DocumentConverter
source = "itr.pdf" # PDF path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "### Docling Technical Report[...]"
Docling version
Here’s my "pip list":
annotated-types 0.7.0
attrs 24.3.0
autoflake 2.3.1
beautifulsoup4 4.12.3
certifi 2024.12.14
charset-normalizer 3.4.0
click 8.1.7
colorama 0.4.6
deepsearch-glm 1.0.0
dill 0.3.9
docling 2.14.0
docling-core 2.12.1
docling-ibm-models 3.1.0
docling-parse 3.0.0
easyocr 1.7.2
et_xmlfile 2.0.0
filelock 3.16.1
filetype 1.2.0
fsspec 2024.10.0
huggingface-hub 0.27.0
idna 3.10
imageio 2.36.1
Jinja2 3.1.4
jsonlines 3.1.0
jsonref 1.1.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
lazy_loader 0.4
lxml 5.3.0
markdown-it-py 3.0.0
marko 2.1.2
MarkupSafe 3.0.2
mdurl 0.1.2
mpire 2.10.2
mpmath 1.3.0
multiprocess 0.70.17
networkx 3.4.2
ninja 1.11.1.3
numpy 2.2.0
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
packaging 24.2
pandas 2.2.3
pillow 10.4.0
pip 24.3.1
pyclipper 1.3.0.post6
pydantic 2.10.4
pydantic_core 2.27.2
pydantic-settings 2.7.0
pyflakes 3.2.0
Pygments 2.18.0
pypdfium2 4.30.1
python-bidi 0.6.3
python-dateutil 2.9.0.post0
python-docx 1.1.2
python-dotenv 1.0.1
python-pptx 1.0.2
pytz 2024.2
pywin32 307
PyYAML 6.0.2
referencing 0.35.1
regex 2024.11.6
requests 2.32.3
rich 13.9.4
rpds-py 0.22.3
Rtree 1.3.0
safetensors 0.4.5
scikit-image 0.25.0
scipy 1.14.1
semchunk 2.2.2
setuptools 75.6.0
shapely 2.0.6
shellingham 1.5.4
six 1.17.0
soupsieve 2.6
sympy 1.13.1
tabulate 0.9.0
tifffile 2024.12.12
tokenizers 0.21.0
torch 2.5.1
torchvision 0.20.1
tqdm 4.67.1
transformers 4.47.1
typer 0.12.5
typing_extensions 4.12.2
tzdata 2024.2
urllib3 2.2.3
XlsxWriter 3.2.0
same happens with my code.. it takes so much time to extract data 🥲
That’s really sad, man :(
My code even runs, but it doesn’t return anything!
@iurysm1, not sure it helps, but, is your pdf file too big? If yes, try a one or two page pdf. Check your CPU (GPU) usage while running.
I tried with the file that is in the example code too:
source = "https://arxiv.org/pdf/2408.09869"
but nothing happened :(
I also had the same problem, but reverting to version 2.12.0 fixed it.
@iurysm1 It works fine for me with 2.14. Try once with fresh poetry install
From the tests I did, I realized that it's something on my computer, maybe it doesn't have the capacity to convert such large PDF documents :(
I tried with a two-page PDF like @bit-scientist told me and it worked.
I tested a large PDF on my computer that I use for gaming and it worked!
Thanks all for collaborating on this issue. I think we can settled it as closed.