MinerU icon indicating copy to clipboard operation
MinerU copied to clipboard

It is impossible to start magic-pdf --version using the command line, and a TypeError is reported.

Open tuhang opened this issue 6 months ago • 1 comments

Description of the bug | 错误描述

Window10, conda, failed to run the command-line demo, prompt

(MinerU) C:\Users\tu_ha>magic-pdf --version
Traceback (most recent call last):
  File "D:\0_dev\anaconda\envs\MinerU\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\0_dev\anaconda\envs\MinerU\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\0_dev\anaconda\envs\MinerU\Scripts\magic-pdf.exe\__main__.py", line 4, in <module>
    from magic_pdf.cli.magicpdf import cli
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 33, in <module>
    from magic_pdf.pipe.UNIPipe import UNIPipe
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 11, in <module>
    from magic_pdf.user_api import parse_union_pdf, parse_ocr_pdf
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\magic_pdf\user_api.py", line 21, in <module>
    from magic_pdf.pdf_parse_by_ocr import parse_pdf_by_ocr
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_by_ocr.py", line 1, in <module>
    from magic_pdf.pdf_parse_union_core import pdf_parse_union
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_union_core.py", line 13, in <module>
    from magic_pdf.para.para_split_v2 import para_split
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\magic_pdf\para\para_split_v2.py", line 1, in <module>
    from sklearn.cluster import DBSCAN
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\__init__.py", line 84, in <module>
    from .base import clone
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\base.py", line 19, in <module>
    from .utils._estimator_html_repr import _HTMLDocumentationLinkMixin, estimator_html_repr
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\utils\__init__.py", line 11, in <module>
    from ._chunking import gen_batches, gen_even_slices
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\utils\_chunking.py", line 8, in <module>
    from ._param_validation import Interval, validate_params
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\utils\_param_validation.py", line 14, in <module>
    from .validation import _is_arraylike_not_scalar
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\utils\validation.py", line 26, in <module>
    from ..utils._array_api import _asarray_with_order, _is_numpy_namespace, get_namespace
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\utils\_array_api.py", line 11, in <module>
    from .fixes import parse_version
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\sklearn\utils\fixes.py", line 20, in <module>
    import scipy.stats
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\stats\__init__.py", line 610, in <module>
    from ._stats_py import *
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\stats\_stats_py.py", line 49, in <module>
    from . import distributions
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\stats\distributions.py", line 10, in <module>
    from . import _continuous_distns
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\stats\_continuous_distns.py", line 12, in <module>
    from scipy.interpolate import BSpline
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\interpolate\__init__.py", line 167, in <module>
    from ._interpolate import *
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\interpolate\_interpolate.py", line 12, in <module>
    from . import _fitpack_py
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\interpolate\_fitpack_py.py", line 8, in <module>
    from ._fitpack_impl import bisplrep, bisplev, dblint  # noqa: F401
  File "D:\0_dev\anaconda\envs\MinerU\lib\site-packages\scipy\interpolate\_fitpack_impl.py", line 103, in <module>
    'iwrk': array([], dfitpack_int), 'u': array([], float),
TypeError

How to reproduce the bug | 如何复现

The command lines that have been run include :pip install magic-pdf[full-cpu],pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/


The existing dependencies at present

(MinerU) C:\Users\tu_ha>pip list
Package                   Version
------------------------- ------------------
absl-py                   2.1.0
aiohttp                   3.9.5
aiosignal                 1.3.1
albucore                  0.0.12
albumentations            1.4.12
altair                    5.3.0
annotated-types           0.7.0
antlr4-python3-runtime    4.9.3
anyio                     4.4.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
astor                     0.8.1
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrdict                  2.0.1
attrs                     23.2.0
Babel                     2.15.0
bce-python-sdk            0.9.17
beautifulsoup4            4.12.3
black                     24.4.2
bleach                    6.1.0
blinker                   1.8.2
boto3                     1.34.149
botocore                  1.34.149
braceexpand               0.1.7
Brotli                    1.1.0
cachetools                5.4.0
certifi                   2024.7.4
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               3.0.0
colorama                  0.4.6
colorlog                  6.8.2
comm                      0.2.2
contourpy                 1.2.1
cryptography              43.0.0
cssselect                 1.2.0
cssutils                  2.11.1
cycler                    0.12.1
Cython                    3.0.10
datasets                  2.20.0
debugpy                   1.8.2
decorator                 5.1.1
defusedxml                0.7.1
detectron2                0.6
dill                      0.3.8
et-xmlfile                1.1.0
eva-decord                0.6.1
eval_type_backport        0.2.0
evaluate                  0.4.2
exceptiongroup            1.2.2
executing                 2.0.1
fairscale                 0.4.13
fast-langdetect           0.2.1
fastjsonschema            2.20.0
fasttext-wheel            0.9.2
filelock                  3.15.4
fire                      0.6.0
Flask                     3.0.3
flask-babel               4.0.0
fonttools                 4.53.1
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.5.0
ftfy                      6.2.0
future                    1.0.0
fvcore                    0.1.5.post20221221
gitdb                     4.0.11
GitPython                 3.1.43
grpcio                    1.65.1
h11                       0.14.0
httpcore                  1.0.5
httpx                     0.27.0
huggingface-hub           0.24.2
hydra-core                1.3.2
idna                      3.7
imageio                   2.34.2
imgaug                    0.4.0
intel-openmp              2021.4.0
iopath                    0.1.9
ipykernel                 6.29.5
ipython                   8.26.0
isoduration               20.11.0
itsdangerous              2.2.0
jedi                      0.19.1
Jinja2                    3.1.4
jmespath                  1.0.1
joblib                    1.4.2
json5                     0.9.25
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.2
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.2
jupyter_server_terminals  0.5.3
jupyterlab                4.2.4
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
kiwisolver                1.4.5
lazy_loader               0.4
lmdb                      1.5.1
loguru                    0.7.2
lxml                      5.2.2
magic-pdf                 0.6.1
Markdown                  3.6
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.9.1
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.0.2
mkl                       2021.4.0
more-itertools            10.3.0
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.16
mypy-extensions           1.0.0
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.3
nltk                      3.8.1
notebook_shim             0.2.4
numpy                     1.26.4
omegaconf                 2.3.0
opencv-contrib-python     4.6.0.66
opencv-python             4.6.0.66
opencv-python-headless    4.10.0.84
openpyxl                  3.1.5
opt-einsum                3.3.0
overrides                 7.7.0
packaging                 24.1
paddleocr                 2.7.3
paddlepaddle              2.6.1
pandas                    2.2.2
pandocfilters             1.5.1
parso                     0.8.4
pathspec                  0.12.1
pdf2docx                  0.5.8
pdf2image                 1.17.0
pdfminer.six              20240706
pillow                    10.4.0
pip                       24.0
platformdirs              4.2.2
portalocker               2.10.1
premailer                 3.10.0
prometheus_client         0.20.0
prompt_toolkit            3.0.47
protobuf                  3.20.2
psutil                    6.0.0
pure_eval                 0.2.3
py-cpuinfo                9.0.0
pyarrow                   17.0.0
pyarrow-hotfix            0.6
pybind11                  2.13.1
pyclipper                 1.3.0.post5
pycocotools               2.0.8
pycparser                 2.22
pycryptodome              3.20.0
pydantic                  2.8.2
pydantic_core             2.20.1
pydeck                    0.9.1
Pygments                  2.18.0
PyMuPDF                   1.24.9
PyMuPDFb                  1.24.9
pyparsing                 3.1.2
pypdfium2                 4.30.0
python-dateutil           2.9.0.post0
python-docx               1.1.2
python-json-logger        2.0.7
pytz                      2024.1
pywin32                   306
pywinpty                  2.0.13
PyYAML                    6.0.1
pyzmq                     26.0.3
rapidfuzz                 3.9.4
rarfile                   4.2
referencing               0.35.1
regex                     2024.7.24
requests                  2.32.3
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
robust-downloader         0.0.2
rpds-py                   0.19.1
s3transfer                0.10.2
safetensors               0.4.3
scikit-image              0.24.0
scikit-learn              1.5.1
scipy                     1.14.0
seaborn                   0.13.2
Send2Trash                1.8.3
setuptools                69.5.1
shapely                   2.0.5
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.1
soupsieve                 2.5
stack-data                0.6.3
streamlit                 1.37.0
streamlit-drawable-canvas 0.9.3
sympy                     1.13.1
tabulate                  0.9.0
tbb                       2021.13.0
tenacity                  8.5.0
tensorboard               2.17.0
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminado                 0.18.1
threadpoolctl             3.5.0
tifffile                  2024.7.24
timm                      0.9.16
tinycss2                  1.3.0
tokenizers                0.19.1
toml                      0.10.2
tomli                     2.0.1
toolz                     0.12.1
torch                     2.3.1
torchtext                 0.18.0
torchvision               0.18.1
tornado                   6.4.1
tqdm                      4.66.4
traitlets                 5.14.3
transformers              4.40.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.12.2
tzdata                    2024.1
ultralytics               8.2.67
ultralytics-thop          2.0.0
unimernet                 0.1.1
uri-template              1.3.0
urllib3                   2.2.2
visualdl                  2.5.3
Wand                      0.6.13
watchdog                  4.0.1
wcwidth                   0.2.13
webcolors                 24.6.0
webdataset                0.2.86
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.3
wheel                     0.43.0
win32-setctime            1.1.0
wordninja                 2.0.0
xxhash                    3.4.1
yacs                      0.1.8
yarl                      1.9.4

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cpu

tuhang avatar Jul 28 '24 16:07 tuhang