MinerU icon indicating copy to clipboard operation
MinerU copied to clipboard

OSError: [WinError 127] 找不到指定的程序。

Open kuailemoyuing opened this issue 1 year ago • 7 comments

Description of the bug | 错误描述

(MinerU) PS C:\Users\30986\MinerU> magic-pdf pdf-command --pdf "E:\Desktop\专利\CN201911321243_FullTextImage.pdf" --inside_model true 2024-07-30 16:54:42.597 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json E:\Desktop\专利\CN201911321243_FullTextImage.json existed 2024-07-30 16:54:42.599 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /tmp\magic-pdf\CN201911321243_FullTextImage\auto 2024-07-30 16:54:43.469 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 8317, cid_chars_radio: 0.0 Traceback (most recent call last): File "C:\Users\30986.conda\envs\MinerU\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\30986.conda\envs\MinerU\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\30986.conda\envs\MinerU\Scripts\magic-pdf.exe_main.py", line 7, in sys.exit(cli()) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1688, in invoke return process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 783, in invoke return callback(*args, **kwargs) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 325, in pdf_command do_parse( File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 111, in do_parse pipe.pipe_analyze() File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 29, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=False) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 65, in doc_analyze from magic_pdf.model.pdf_extract_kit import CustomPEKModel File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 16, in from unimernet.common.config import Config File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\unimernet_init.py", line 18, in from unimernet.tasks import * File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\unimernet\tasks_init.py", line 10, in from unimernet.tasks.unimernet_train import UniMERNet_Train File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\unimernet\tasks\unimernet_train.py", line 11, in from torchtext.data import metrics File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_init.py", line 18, in from torchtext import _extension # noqa: F401 File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_extension.py", line 64, in _init_extension() File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_extension.py", line 58, in _init_extension _load_lib("libtorchtext") File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_extension.py", line 50, in load_lib torch.ops.load_library(path) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torch_ops.py", line 1295, in load_library ctypes.CDLL(path) File "C:\Users\30986.conda\envs\MinerU\lib\ctypes_init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: [WinError 127] 找不到指定的程序。

How to reproduce the bug | 如何复现

Package Version


absl-py 2.1.0 aiohttp 3.9.5 aiosignal 1.3.1 albucore 0.0.12 albumentations 1.4.12 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 astor 0.8.1 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 Babel 2.15.0 bce-python-sdk 0.9.17 beautifulsoup4 4.12.3 black 24.4.2 blinker 1.8.2 boto3 1.34.150 botocore 1.34.150 braceexpand 0.1.7 Brotli 1.1.0 cachetools 5.4.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorama 0.4.6 colorlog 6.8.2 contourpy 1.2.1 cryptography 43.0.0 cssselect 1.2.0 cssutils 2.11.1 cycler 0.12.1 Cython 3.0.10 datasets 2.20.0 decorator 5.1.1 detectron2 0.6 dill 0.3.8 et-xmlfile 1.1.0 eva-decord 0.6.1 eval_type_backport 0.2.0 evaluate 0.4.2 exceptiongroup 1.2.2 fairscale 0.4.13 fast-langdetect 0.2.1 fasttext-wheel 0.9.2 filelock 3.15.4 fire 0.6.0 Flask 3.0.3 flask-babel 4.0.0 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 ftfy 6.2.0 future 1.0.0 fvcore 0.1.5.post20221221 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.3 hydra-core 1.3.2 idna 3.7 imageio 2.34.2 imgaug 0.4.0 intel-openmp 2021.4.0 iopath 0.1.9 itsdangerous 2.2.0 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 kiwisolver 1.4.5 lazy_loader 0.4 lmdb 1.5.1 loguru 0.7.2 lxml 5.2.2 magic-pdf 0.6.1 Markdown 3.6 MarkupSafe 2.1.5 matplotlib 3.9.1 mkl 2021.4.0 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 networkx 3.3 numpy 1.26.4 omegaconf 2.3.0 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 packaging 24.1 paddleocr 2.7.3 paddlepaddle 2.6.1 pandas 2.2.2 pathspec 0.12.1 pdf2docx 0.5.8 pdfminer.six 20240706 pillow 10.4.0 pip 24.0 platformdirs 4.2.2 portalocker 2.10.1 premailer 3.10.0 protobuf 3.20.2 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-docx 1.1.2 pytz 2024.1 pywin32 306 PyYAML 6.0.1 rapidfuzz 3.9.5 rarfile 4.2 regex 2024.7.24 requests 2.32.3 robust-downloader 0.0.2 s3transfer 0.10.2 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 shapely 2.0.5 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 sympy 1.13.1 tabulate 0.9.0 tbb 2021.13.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 threadpoolctl 3.5.0 tifffile 2024.7.24 timm 0.9.16 tokenizers 0.19.1 tomli 2.0.1 torch 2.4.0+cu124 torchaudio 2.4.0+cu124 torchtext 0.18.0 torchvision 0.19.0+cu124 tqdm 4.66.4 transformers 4.40.0 typing_extensions 4.12.2 tzdata 2024.1 ultralytics 8.2.69 ultralytics-thop 2.0.0 unimernet 0.1.6 urllib3 2.2.2 visualdl 2.5.3 Wand 0.6.13 wcwidth 0.2.13 webdataset 0.2.86 Werkzeug 3.0.3 wheel 0.43.0 win32-setctime 1.1.0 wordninja 2.0.0 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cuda

kuailemoyuing avatar Jul 30 '24 09:07 kuailemoyuing

refer to this issue:https://github.com/opendatalab/MinerU/issues/240 torchtext 0.18.0 need torch 2.3.1 & torchvision 0.18.1

myhloli avatar Jul 30 '24 09:07 myhloli

use 'pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118' ,

has conflict 'ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. fast-langdetect 0.2.1 requires numpy<2.0.0,>=1.26.4, but you have numpy 1.26.3 which is incompatible.'

and installed : torch 2.3.1+cu118 torchtext 0.18.0 torchvision 0.18.1+cu118

not torch 2.3.1 & torchvision 0.18.1

wuminmin avatar Jul 30 '24 13:07 wuminmin

use 'pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118' ,

has conflict 'ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

fast-langdetect 0.2.1 requires numpy<2.0.0,>=1.26.4, but you have numpy 1.26.3 which is incompatible.'

and installed :

torch 2.3.1+cu118

torchtext 0.18.0

torchvision 0.18.1+cu118

not torch 2.3.1 & torchvision 0.18.1

it's doesn't matter.don't mind this error,program will run well.

myhloli avatar Jul 30 '24 13:07 myhloli

magic-pdf pdf-command --pdf "Mobile Postpaid ESAF.pdf" --inside_model true 2024-07-30 21:11:58.453 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json Mobile Postpaid ESAF.json existed 2024-07-30 21:11:58.454 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /tmp\magic-pdf\Mobile Postpaid ESAF\auto 2024-07-30 21:11:59.540 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 5094, cid_chars_radio: 0.0 2024-07-30 21:12:05.317 | INFO | magic_pdf.model.pdf_extract_kit:init:92 - DocAnalysis init, this may take some times. apply_layout: True, apply_formula: True, apply_ocr: False 2024-07-30 21:12:05.317 | INFO | magic_pdf.model.pdf_extract_kit:init:100 - using device: cuda Traceback (most recent call last): File "C:\Users\buckw.conda\envs\mu\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\buckw.conda\envs\mu\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\buckw.conda\envs\mu\Scripts\magic-pdf.exe_main.py", line 7, in sys.exit(cli()) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\click\core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\click\core.py", line 783, in invoke return __callback(*args, **kwargs) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\magic_pdf\cli\magicpdf.py", line 325, in pdf_command do_parse( File "C:\Users\buckw.conda\envs\mu\lib\site-packages\magic_pdf\cli\magicpdf.py", line 111, in do_parse pipe.pipe_analyze() File "C:\Users\buckw.conda\envs\mu\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 29, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=False) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 69, in doc_analyze custom_model = CustomPEKModel(ocr=ocr, show_log=show_log, models_dir=local_models_dir, device=device) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 106, in init self.mfd_model = mfd_model_init(str(os.path.join(models_dir, self.configs["weights"]["mfd"]))) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 29, in mfd_model_init mfd_model = YOLO(weight) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\ultralytics\models\yolo\model.py", line 23, in init super().init(model=model, task=task, verbose=verbose) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\ultralytics\engine\model.py", line 142, in init self._load(model, task=task) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\ultralytics\engine\model.py", line 294, in _load self.model, self.ckpt = attempt_load_one_weight(weights) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\ultralytics\nn\tasks.py", line 855, in attempt_load_one_weight ckpt, weight = torch_safe_load(weight) # load ckpt File "C:\Users\buckw.conda\envs\mu\lib\site-packages\ultralytics\nn\tasks.py", line 781, in torch_safe_load ckpt = torch.load(file, map_location="cpu") File "C:\Users\buckw.conda\envs\mu\lib\site-packages\ultralytics\utils\patches.py", line 86, in torch_load return _torch_load(*args, **kwargs) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\torch\serialization.py", line 997, in load with _open_file_like(f, 'rb') as opened_file: File "C:\Users\buckw.conda\envs\mu\lib\site-packages\torch\serialization.py", line 444, in _open_file_like return _open_file(name_or_buffer, mode) File "C:\Users\buckw.conda\envs\mu\lib\site-packages\torch\serialization.py", line 425, in init super().init(open(name, mode)) OSError: [Errno 22] Invalid argument: 'C:\Users\x08uckw\SynologyDrive\github\gpt\MinerU\PDF-Extract-Kit\models\MFD\weights.pt'

wuminmin avatar Jul 30 '24 13:07 wuminmin

change to "models-dir":"C:\PDF-Extract-Kit\models", it is work!

wuminmin avatar Jul 30 '24 13:07 wuminmin

Description of the bug | 错误描述

(MinerU) PS C:\Users\30986\MinerU> magic-pdf pdf-command --pdf "E:\Desktop\专利\CN201911321243_FullTextImage.pdf" --inside_model true 2024-07-30 16:54:42.597 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json E:\Desktop\专利\CN201911321243_FullTextImage.json existed 2024-07-30 16:54:42.599 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /tmp\magic-pdf\CN201911321243_FullTextImage\auto 2024-07-30 16:54:43.469 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 8317, cid_chars_radio: 0.0 Traceback (most recent call last): File "C:\Users\30986.conda\envs\MinerU\lib\runpy.py", line 196, in run_module_as_main return run_code(code, main_globals, None, File "C:\Users\30986.conda\envs\MinerU\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\30986.conda\envs\MinerU\Scripts\magic-pdf.exe__main.py", line 7, in sys.exit(cli()) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1688, in invoke return process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\click\core.py", line 783, in invoke return callback(*args, **kwargs) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 325, in pdf_command do_parse( File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 111, in do_parse pipe.pipe_analyze() File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 29, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=False) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 65, in doc_analyze from magic_pdf.model.pdf_extract_kit import CustomPEKModel File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 16, in from unimernet.common.config import Config File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\unimernet__init.py", line 18, in from unimernet.tasks import * File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\unimernet\tasks__init.py", line 10, in from unimernet.tasks.unimernet_train import UniMERNet_Train File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\unimernet\tasks\unimernet_train.py", line 11, in from torchtext.data import metrics File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext__init_.py", line 18, in from torchtext import _extension # noqa: F401 File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_extension.py", line 64, in _init_extension() File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_extension.py", line 58, in _init_extension load_lib("libtorchtext") File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torchtext_extension.py", line 50, in load_lib torch.ops.load_library(path) File "C:\Users\30986.conda\envs\MinerU\lib\site-packages\torch_ops.py", line 1295, in load_library ctypes.CDLL(path) File "C:\Users\30986.conda\envs\MinerU\lib\ctypes__init.py", line 374, in init self._handle = _dlopen(self.name, mode) OSError: [WinError 127] 找不到指定的程序。

How to reproduce the bug | 如何复现

Package Version

absl-py 2.1.0 aiohttp 3.9.5 aiosignal 1.3.1 albucore 0.0.12 albumentations 1.4.12 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 astor 0.8.1 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 Babel 2.15.0 bce-python-sdk 0.9.17 beautifulsoup4 4.12.3 black 24.4.2 blinker 1.8.2 boto3 1.34.150 botocore 1.34.150 braceexpand 0.1.7 Brotli 1.1.0 cachetools 5.4.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorama 0.4.6 colorlog 6.8.2 contourpy 1.2.1 cryptography 43.0.0 cssselect 1.2.0 cssutils 2.11.1 cycler 0.12.1 Cython 3.0.10 datasets 2.20.0 decorator 5.1.1 detectron2 0.6 dill 0.3.8 et-xmlfile 1.1.0 eva-decord 0.6.1 eval_type_backport 0.2.0 evaluate 0.4.2 exceptiongroup 1.2.2 fairscale 0.4.13 fast-langdetect 0.2.1 fasttext-wheel 0.9.2 filelock 3.15.4 fire 0.6.0 Flask 3.0.3 flask-babel 4.0.0 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 ftfy 6.2.0 future 1.0.0 fvcore 0.1.5.post20221221 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.3 hydra-core 1.3.2 idna 3.7 imageio 2.34.2 imgaug 0.4.0 intel-openmp 2021.4.0 iopath 0.1.9 itsdangerous 2.2.0 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 kiwisolver 1.4.5 lazy_loader 0.4 lmdb 1.5.1 loguru 0.7.2 lxml 5.2.2 magic-pdf 0.6.1 Markdown 3.6 MarkupSafe 2.1.5 matplotlib 3.9.1 mkl 2021.4.0 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 networkx 3.3 numpy 1.26.4 omegaconf 2.3.0 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 packaging 24.1 paddleocr 2.7.3 paddlepaddle 2.6.1 pandas 2.2.2 pathspec 0.12.1 pdf2docx 0.5.8 pdfminer.six 20240706 pillow 10.4.0 pip 24.0 platformdirs 4.2.2 portalocker 2.10.1 premailer 3.10.0 protobuf 3.20.2 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-docx 1.1.2 pytz 2024.1 pywin32 306 PyYAML 6.0.1 rapidfuzz 3.9.5 rarfile 4.2 regex 2024.7.24 requests 2.32.3 robust-downloader 0.0.2 s3transfer 0.10.2 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 shapely 2.0.5 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 sympy 1.13.1 tabulate 0.9.0 tbb 2021.13.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 threadpoolctl 3.5.0 tifffile 2024.7.24 timm 0.9.16 tokenizers 0.19.1 tomli 2.0.1 torch 2.4.0+cu124 torchaudio 2.4.0+cu124 torchtext 0.18.0 torchvision 0.19.0+cu124 tqdm 4.66.4 transformers 4.40.0 typing_extensions 4.12.2 tzdata 2024.1 ultralytics 8.2.69 ultralytics-thop 2.0.0 unimernet 0.1.6 urllib3 2.2.2 visualdl 2.5.3 Wand 0.6.13 wcwidth 0.2.13 webdataset 0.2.86 Werkzeug 3.0.3 wheel 0.43.0 win32-setctime 1.1.0 wordninja 2.0.0 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cuda

does it work yet?

wanglf1979 avatar Jul 31 '24 06:07 wanglf1979

change to "models-dir":"C:\PDF-Extract-Kit\models", it is work! Description of the bug | 错误描述 magic-pdf pdf-command --pdf "E:\pdfs\test2.pdf" --inside_model true 2024-07-31 14:17:57.108 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json E:\pdfs\test2.json existed Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in run_code File "E:\Python312\Scripts\magic-pdf.exe_main.py", line 7, in File "E:\Python312\Lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\click\core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\click\core.py", line 783, in invoke return callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\magic_pdf\cli\magicpdf.py", line 325, in pdf_command do_parse( File "E:\Python312\Lib\site-packages\magic_pdf\cli\magicpdf.py", line 90, in do_parse local_image_dir, local_md_dir = prepare_env(pdf_file_name, parse_method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\magic_pdf\cli\magicpdf.py", line 56, in prepare_env local_parent_dir = os.path.join(get_local_dir(), "magic-pdf", pdf_file_name, method) ^^^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\magic_pdf\libs\config_reader.py", line 58, in get_local_dir config = read_config() ^^^^^^^^^^^^^ File "E:\Python312\Lib\site-packages\magic_pdf\libs\config_reader.py", line 23, in read_config config = json.load(f) ^^^^^^^^^^^^ File "E:\Python312\Lib\json_init.py", line 293, in load return loads(fp.read(), ^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\json_init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python312\Lib\json\decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) ^^^^^^^^^^^^^^^^^^^^^^ json.decoder.JSONDecodeError: Invalid \escape: line 7 column 21 (char 180)

Operating system | 操作系统 Windows

Python version | Python 版本 3.12

Software version | 软件版本 (magic-pdf --version) 0.6.x

Device mode | 设备模式 cpu

How to reproduce the bug Package Version


boto3 1.34.151 botocore 1.34.151 Brotli 1.1.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 colorlog 6.8.2 cryptography 43.0.0 fast-langdetect 0.2.1 fasttext-wheel 0.9.2 filelock 3.15.4 fsspec 2024.6.1 idna 3.7 intel-openmp 2021.4.0 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 loguru 0.7.2 magic-pdf 0.6.1 MarkupSafe 2.1.5 mkl 2021.4.0 mpmath 1.3.0 networkx 3.3 numpy 1.26.4 opencv-python 4.10.0.84 pdfminer.six 20240706 pillow 10.4.0 pip 24.2 pybind11 2.13.1 pycparser 2.22 pyenv-win 3.1.1 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 python-dateutil 2.9.0.post0 requests 2.32.3 robust-downloader 0.0.2 s3transfer 0.10.2 scikit-learn 1.5.1 scipy 1.14.0 setuptools 72.1.0 six 1.16.0 sympy 1.13.1 tbb 2021.13.0 threadpoolctl 3.5.0 torch 2.3.1 torchvision 0.18.1 tqdm 4.66.4 typing_extensions 4.12.2 urllib3 2.2.2 wheel 0.43.0 win32-setctime 1.1.0 wordninja 2.0.0

magic-pdf.json { "bucket_info":{ "bucket-name-1":["ak", "sk", "endpoint"], "bucket-name-2":["ak", "sk", "endpoint"] }, "temp-output-dir":"E:\tmp", "models-dir":"E:\data\models", "device-mode":"cpu" }

wanglf1979 avatar Jul 31 '24 06:07 wanglf1979

We have updated to the 0.6.2b1 release, addressing and resolving the aforementioned issue.

myhloli avatar Aug 01 '24 02:08 myhloli