MinerU icon indicating copy to clipboard operation
MinerU copied to clipboard

Error loading model /tmp/fasttext-langdetect/lid.176.ftz: Invalid model file.

Open hotwa opened this issue 1 year ago • 3 comments

Description of the bug | 错误描述

Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980174-7.pdf on GPU 5 STDERR: 2024-07-31 09:55:08.822 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980174-7.json existed 2024-07-31 09:55:08.844 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/test/magic-pdf/0040-4020%2873%2980174-7/auto Error loading model /tmp/fasttext-langdetect/lid.176.ftz: Invalid model file. Please download the updated model from www.fasttext.cc. See issue #332 on Github for more information.

[[36m2024-07-31 09:55:11,034[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: vector::_M_default_append [[36m2024-07-31 09:55:13,215[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 66, in get_model_loaded raise e File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 61, in get_model_loaded loaded_model = fasttext.load_model(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: Invalid model file. Please download the updated model from www.fasttext.cc. See issue #332 on Github for more information.

How to reproduce the bug | 如何复现

conda create -n ${CONDA_ENV_NAME} python=3.11 -y conda activate ${CONDA_ENV_NAME} python -m pip install magic-pdf[full-cpu] python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' python -m pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118 cat <<EOF > ~/magic-pdf.json { "temp-output-dir":"/results/output", "models-dir":"/models", "device-mode":"cuda" } EOF

Operating system | 操作系统

Linux

Python version | Python 版本

3.11

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cuda

hotwa avatar Jul 31 '24 10:07 hotwa

Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980125-5.pdf on GPU 3 STDERR: 2024-07-31 10:48:27.187 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980125-5.json existed 2024-07-31 10:48:27.206 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2873%2980125-5/auto [[36m2024-07-31 10:48:28,303[0m][[32mINFO[0m] - Downloading https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz to lid.176.ftz (916.0K)[0m

0%| | 0.00/916k [00:00<?, ?B/s] 11%|█ | 97.0k/916k [00:00<00:00, 928kB/s] 35%|███▌ | 321k/916k [00:00<00:00, 1.65MB/s] 61%|██████ | 561k/916k [00:00<00:00, 1.96MB/s] 86%|████████▌ | 785k/916k [00:00<00:00, 2.05MB/s] 100%|██████████| 916k/916k [00:00<00:00, 1.87MB/s] Error loading model /tmp/fasttext-langdetect/lid.176.ftz: vector::_M_default_append [[36m2024-07-31 10:48:30,181[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: vector::_M_default_append

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in sys.exit(cli()) ^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command do_parse( File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 106, in do_parse pipe.pipe_classify() File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify self.pdf_type = AbsPipe.classify(self.pdf_bytes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/pipe/AbsPipe.py", line 63, in classify pdf_meta = pdf_meta_scan(pdf_bytes) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 337, in pdf_meta_scan text_language = get_language(doc) ^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 289, in get_language page_language = detect_lang(text_block) ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/libs/language.py", line 12, in detect_lang lang_upper = detect_language(html_no_ctrl_chars) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 66, in get_model_loaded raise e File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 61, in get_model_loaded loaded_model = fasttext.load_model(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: vector::_M_default_append

hotwa avatar Jul 31 '24 10:07 hotwa

python==3.10遇到的问题: Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980092-5.pdf on GPU 2 STDERR: 2024-07-31 11:01:30.201 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980092-5.json existed 2024-07-31 11:01:30.645 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2875%2980092-5/auto [[36m2024-07-31 11:01:32,758[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: cannot create std::vector larger than max_size() [[36m2024-07-31 11:01:33,950[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in sys.exit(cli()) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command do_parse( File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 106, in do_parse pipe.pipe_classify() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify self.pdf_type = AbsPipe.classify(self.pdf_bytes) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/pipe/AbsPipe.py", line 63, in classify pdf_meta = pdf_meta_scan(pdf_bytes) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 337, in pdf_meta_scan text_language = get_language(doc) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 289, in get_language page_language = detect_lang(text_block) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 12, in detect_lang lang_upper = detect_language(html_no_ctrl_chars) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 66, in get_model_loaded raise e File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 61, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()

Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980220-1.pdf on GPU 4 STDERR: 2024-07-31 11:01:30.157 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980220-1.json existed 2024-07-31 11:01:30.317 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2875%2980220-1/auto [[36m2024-07-31 11:01:32,726[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: cannot create std::vector larger than max_size() [[36m2024-07-31 11:01:33,890[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in sys.exit(cli()) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command do_parse( File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 106, in do_parse pipe.pipe_classify() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify self.pdf_type = AbsPipe.classify(self.pdf_bytes) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/pipe/AbsPipe.py", line 63, in classify pdf_meta = pdf_meta_scan(pdf_bytes) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 337, in pdf_meta_scan text_language = get_language(doc) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 289, in get_language page_language = detect_lang(text_block) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 12, in detect_lang lang_upper = detect_language(html_no_ctrl_chars) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 66, in get_model_loaded raise e File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 61, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()

Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2869%2985002-7.pdf on GPU 6 STDERR: 2024-07-31 11:01:30.206 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2869%2985002-7.json existed 2024-07-31 11:01:30.389 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2869%2985002-7/auto [[36m2024-07-31 11:01:32,926[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: cannot create std::vector larger than max_size() [[36m2024-07-31 11:01:34,122[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in sys.exit(cli()) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command do_parse( File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 106, in do_parse pipe.pipe_classify() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify self.pdf_type = AbsPipe.classify(self.pdf_bytes) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/pipe/AbsPipe.py", line 63, in classify pdf_meta = pdf_meta_scan(pdf_bytes) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 337, in pdf_meta_scan text_language = get_language(doc) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 289, in get_language page_language = detect_lang(text_block) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 12, in detect_lang lang_upper = detect_language(html_no_ctrl_chars) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 66, in get_model_loaded raise e File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 61, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()

hotwa avatar Jul 31 '24 11:07 hotwa

We have updated to the 0.6.2b1 release, addressing and resolving the aforementioned issue.

myhloli avatar Jul 31 '24 12:07 myhloli