Error loading model /tmp/fasttext-langdetect/lid.176.ftz: Invalid model file.
Description of the bug | 错误描述
Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980174-7.pdf on GPU 5 STDERR: 2024-07-31 09:55:08.822 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980174-7.json existed 2024-07-31 09:55:08.844 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/test/magic-pdf/0040-4020%2873%2980174-7/auto Error loading model /tmp/fasttext-langdetect/lid.176.ftz: Invalid model file. Please download the updated model from www.fasttext.cc. See issue #332 on Github for more information.
[[36m2024-07-31 09:55:11,034[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: vector::_M_default_append [[36m2024-07-31 09:55:13,215[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 66, in get_model_loaded raise e File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 61, in get_model_loaded loaded_model = fasttext.load_model(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: Invalid model file. Please download the updated model from www.fasttext.cc. See issue #332 on Github for more information.
How to reproduce the bug | 如何复现
conda create -n ${CONDA_ENV_NAME} python=3.11 -y conda activate ${CONDA_ENV_NAME} python -m pip install magic-pdf[full-cpu] python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' python -m pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118 cat <<EOF > ~/magic-pdf.json { "temp-output-dir":"/results/output", "models-dir":"/models", "device-mode":"cuda" } EOF
Operating system | 操作系统
Linux
Python version | Python 版本
3.11
Software version | 软件版本 (magic-pdf --version)
0.6.x
Device mode | 设备模式
cuda
Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980125-5.pdf on GPU 3 STDERR: 2024-07-31 10:48:27.187 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2873%2980125-5.json existed 2024-07-31 10:48:27.206 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2873%2980125-5/auto [[36m2024-07-31 10:48:28,303[0m][[32mINFO[0m] - Downloading https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz to lid.176.ftz (916.0K)[0m
0%| | 0.00/916k [00:00<?, ?B/s] 11%|█ | 97.0k/916k [00:00<00:00, 928kB/s] 35%|███▌ | 321k/916k [00:00<00:00, 1.65MB/s] 61%|██████ | 561k/916k [00:00<00:00, 1.96MB/s] 86%|████████▌ | 785k/916k [00:00<00:00, 2.05MB/s] 100%|██████████| 916k/916k [00:00<00:00, 1.87MB/s] Error loading model /tmp/fasttext-langdetect/lid.176.ftz: vector::_M_default_append [[36m2024-07-31 10:48:30,181[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.11/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/mineru/lib/python3.11/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: vector::_M_default_append
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in
python==3.10遇到的问题: Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980092-5.pdf on GPU 2 STDERR: 2024-07-31 11:01:30.201 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980092-5.json existed 2024-07-31 11:01:30.645 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2875%2980092-5/auto [[36m2024-07-31 11:01:32,758[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: cannot create std::vector larger than max_size() [[36m2024-07-31 11:01:33,950[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in
Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980220-1.pdf on GPU 4 STDERR: 2024-07-31 11:01:30.157 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2875%2980220-1.json existed 2024-07-31 11:01:30.317 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2875%2980220-1/auto [[36m2024-07-31 11:01:32,726[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: cannot create std::vector larger than max_size() [[36m2024-07-31 11:01:33,890[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in
Processed /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2869%2985002-7.pdf on GPU 6 STDERR: 2024-07-31 11:01:30.206 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /tmp/libgen.scimag02425000-02425999_unzip/10.1016/0040-4020%2869%2985002-7.json existed 2024-07-31 11:01:30.389 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /results/paper/magic-pdf/0040-4020%2869%2985002-7/auto [[36m2024-07-31 11:01:32,926[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Error loading model /tmp/fasttext-langdetect/lid.176.ftz: cannot create std::vector larger than max_size() [[36m2024-07-31 11:01:34,122[0m][[33mWARNING[0m] - File lid.176.ftz may be corrupted. It is recommended to re-try downloading it.[0m Traceback (most recent call last): File "/opt/conda/envs/mineru/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang lang_upper = detect_language(text) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 71, in get_model_loaded loaded_model = fasttext.load_model(model_path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 441, in load_model return _FastText(model_path=path) File "/opt/conda/envs/mineru/lib/python3.10/site-packages/fasttext/FastText.py", line 98, in init self.f.loadModel(model_path) ValueError: cannot create std::vector larger than max_size()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/mineru/bin/magic-pdf", line 8, in
We have updated to the 0.6.2b1 release, addressing and resolving the aforementioned issue.