docling icon indicating copy to clipboard operation
docling copied to clipboard

When I run the example code, nothing happens—not even an error.

Open iurysm1 opened this issue 1 year ago • 2 comments

Bug

First, I want to apologize for my English; I’m using a translator.

I’m running the base code provided in the documentation, but nothing happens.

I ran it in Python (version 3.12.2) and via CLI; in both cases, no errors are returned.

Steps to reproduce

Here’s my code:

from docling.document_converter import DocumentConverter

source = "itr.pdf"  # PDF path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "### Docling Technical Report[...]"

Docling version

Here’s my "pip list":

annotated-types           0.7.0
attrs                     24.3.0
autoflake                 2.3.1
beautifulsoup4            4.12.3
certifi                   2024.12.14
charset-normalizer        3.4.0
click                     8.1.7
colorama                  0.4.6
deepsearch-glm            1.0.0
dill                      0.3.9
docling                   2.14.0
docling-core              2.12.1
docling-ibm-models        3.1.0
docling-parse             3.0.0
easyocr                   1.7.2
et_xmlfile                2.0.0
filelock                  3.16.1
filetype                  1.2.0
fsspec                    2024.10.0
huggingface-hub           0.27.0
idna                      3.10
imageio                   2.36.1
Jinja2                    3.1.4
jsonlines                 3.1.0
jsonref                   1.1.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
lazy_loader               0.4
lxml                      5.3.0
markdown-it-py            3.0.0
marko                     2.1.2
MarkupSafe                3.0.2
mdurl                     0.1.2
mpire                     2.10.2
mpmath                    1.3.0
multiprocess              0.70.17
networkx                  3.4.2
ninja                     1.11.1.3
numpy                     2.2.0
opencv-python-headless    4.10.0.84
openpyxl                  3.1.5
packaging                 24.2
pandas                    2.2.3
pillow                    10.4.0
pip                       24.3.1
pyclipper                 1.3.0.post6
pydantic                  2.10.4
pydantic_core             2.27.2
pydantic-settings         2.7.0
pyflakes                  3.2.0
Pygments                  2.18.0
pypdfium2                 4.30.1
python-bidi               0.6.3
python-dateutil           2.9.0.post0
python-docx               1.1.2
python-dotenv             1.0.1
python-pptx               1.0.2
pytz                      2024.2
pywin32                   307
PyYAML                    6.0.2
referencing               0.35.1
regex                     2024.11.6
requests                  2.32.3
rich                      13.9.4
rpds-py                   0.22.3
Rtree                     1.3.0
safetensors               0.4.5
scikit-image              0.25.0
scipy                     1.14.1
semchunk                  2.2.2
setuptools                75.6.0
shapely                   2.0.6
shellingham               1.5.4
six                       1.17.0
soupsieve                 2.6
sympy                     1.13.1
tabulate                  0.9.0
tifffile                  2024.12.12
tokenizers                0.21.0
torch                     2.5.1
torchvision               0.20.1
tqdm                      4.67.1
transformers              4.47.1
typer                     0.12.5
typing_extensions         4.12.2
tzdata                    2024.2
urllib3                   2.2.3
XlsxWriter                3.2.0

iurysm1 avatar Dec 19 '24 20:12 iurysm1

same happens with my code.. it takes so much time to extract data 🥲

Smit3949 avatar Dec 20 '24 10:12 Smit3949

That’s really sad, man :(

My code even runs, but it doesn’t return anything!

iurysm1 avatar Dec 20 '24 10:12 iurysm1

@iurysm1, not sure it helps, but, is your pdf file too big? If yes, try a one or two page pdf. Check your CPU (GPU) usage while running.

bit-scientist avatar Dec 24 '24 05:12 bit-scientist

I tried with the file that is in the example code too:

source = "https://arxiv.org/pdf/2408.09869"

but nothing happened :(

iurysm1 avatar Dec 24 '24 13:12 iurysm1

I also had the same problem, but reverting to version 2.12.0 fixed it.

lauriejohnstongentekai avatar Dec 24 '24 18:12 lauriejohnstongentekai

@iurysm1 It works fine for me with 2.14. Try once with fresh poetry install

trinanjan12 avatar Dec 28 '24 06:12 trinanjan12

From the tests I did, I realized that it's something on my computer, maybe it doesn't have the capacity to convert such large PDF documents :(

I tried with a two-page PDF like @bit-scientist told me and it worked.

I tested a large PDF on my computer that I use for gaming and it worked!

iurysm1 avatar Dec 30 '24 11:12 iurysm1

Thanks all for collaborating on this issue. I think we can settled it as closed.

dolfim-ibm avatar Jan 30 '25 09:01 dolfim-ibm