unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/error while loading unstructured.partition.pdf import partition_pdf

Open dtruong46me opened this issue 1 year ago • 4 comments

I use python==3.10.3, unstructured==0.15.12

from unstructured.partition.pdf import partition_pdf
PS C:\Users\ProjectName\test.py"
Traceback (most recent call last):
  File "c:\Users\DELL\OneDrive - Hanoi University of Science and Technology\03. IT-E10 K66 HUST\60. FPT-TEL\inf-chatbot\src\test\test.py", line 3, in <module>
    from unstructured.partition.pdf import partition_pdf
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\unstructured\partition\pdf.py", line 56, in <module>
    from unstructured.partition.pdf_image.analysis.layout_dump import (
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\unstructured\partition\pdf_image\analysis\layout_dump.py", line 8, in <module>
    from unstructured_inference.inference.layout import DocumentLayout
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\unstructured_inference\inference\layout.py", line 19, in <module>
    from unstructured_inference.models.base import get_model
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\unstructured_inference\models\base.py", line 9, in <module>
    from unstructured_inference.models.detectron2onnx import MODEL_TYPES as DETECTRON2_ONNX_MODEL_TYPES
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\unstructured_inference\models\detectron2onnx.py", line 9, in <module>
    from onnxruntime.quantization import QuantType, quantize_dynamic
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\onnxruntime\quantization\__init__.py", line 1, in <module>
    from .calibrate import (  # noqa: F401
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\onnxruntime\quantization\calibrate.py", line 22, in <module>
    from .quant_utils import apply_plot, load_model_with_shape_infer, smooth_distribution
  File "C:\Users\DELL\AppData\Roaming\Python\Python310\site-packages\onnxruntime\quantization\quant_utils.py", line 145, in <module>
    onnx_proto.TensorProto.INT4: int4,  # base_dtype is np.int8
AttributeError: INT4. Did you mean: 'INT64'?

How can I fix this

dtruong46me avatar Sep 18 '24 09:09 dtruong46me

Having this same issue with python==3.11.5.

dmitrisaberi avatar Sep 25 '24 22:09 dmitrisaberi

It was an issue with onnx version for me @dtruong46me. Try running pip install --upgrade onnx. With onnx==1.16.2 it works.

dmitrisaberi avatar Sep 26 '24 16:09 dmitrisaberi

still having issue even after having onnx==1.16.2

Has anyone resolved this issue with partition_pdf?

hshaikusa avatar Sep 27 '24 21:09 hshaikusa

I upgraded to onnx=1.17.0 and that fixed the error for me. Thank you @dmitrisaberi for the suggestion!

jaikb avatar Oct 02 '24 19:10 jaikb

Closing, assumed resolved as current onnx version is 1.19.2.

scanny avatar Dec 16 '24 18:12 scanny

from unstructured.partition.pdf import partition_pdf after doing this

error:

Cell In[7], line 1 ----> 1 from unstructured.partition.pdf import partition_pdf

File c:\Users\ASUS\anaconda3\Lib\site-packages\unstructured\partition\pdf.py:17 15 from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox 16 from pdfminer.utils import open_filename ---> 17 from pi_heif import register_heif_opener 18 from PIL import Image as PILImage 19 from pypdf import PdfReader

ModuleNotFoundError: No module named 'pi_heif'

then i have done this !pip install "unstructured[all-docs]"

Now getting this error ImportError: DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed.

Rittik003 avatar Jan 17 '25 15:01 Rittik003