报错ValueError: Unable to avoid copy while creating an array as requested.
Description of the bug | 错误描述
采用版本为0.7.0b1,在运行测试时出现ValueError: Unable to avoid copy while creating an array as requested.报错,完整内容如下:
2024-08-13 16:23:06.329 | ERROR | magic_pdf.tools.cli:parse_doc:69 - Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
Traceback (most recent call last):
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 20, in detect_lang lang_upper = detect_language(text) │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... └ <function detect_language at 0x0000029B9965FEB0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() │ │ └ True │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... └ <function detect at 0x0000029B99974B80>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect labels, scores = model.predict(text) │ │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... │ └ <function _FastText.predict at 0x0000029B9967CEE0> └ <fasttext.FastText._FastText object at 0x0000029BBE2B9D50>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 221, in predict
text = check(text)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function _FastText.predict.
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 208, in check raise ValueError(
ValueError: predict processes one line at a time (remove '\n')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
│ │ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
│ └ <code object
File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
│ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
└ <code object
File "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main_.py", line 7, in
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) │ │ │ └ {} │ │ └ () │ └ <function BaseCommand.main at 0x0000029B9708FAC0> └ <Command cli>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) │ │ └ <click.core.Context object at 0x0000029B955187F0> │ └ <function Command.invoke at 0x0000029B970A45E0> └ <Command cli>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) │ │ │ │ │ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'} │ │ │ │ └ <click.core.Context object at 0x0000029B955187F0> │ │ │ └ <function cli at 0x0000029BBE2CF490> │ │ └ <Command cli> │ └ <function Context.invoke at 0x0000029B9708F2E0> └ <click.core.Context object at 0x0000029B955187F0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 783, in invoke return __callback(*args, **kwargs) │ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'} └ ()
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 73, in cli
parse_doc(doc_path)
│ └ WindowsPath('C:/Users/dengg/Desktop/test/Optical characteristics and energy transfer analysis of Dy3+-Pr3+ ions doped in CeF3...
└ <function cli.
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 60, in parse_doc do_parse( └ <function do_parse at 0x0000029BBE2CE8C0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\common.py", line 61, in do_parse pipe.pipe_classify() │ └ <function UNIPipe.pipe_classify at 0x0000029BBE2CE5F0> └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000029BBE2B9510>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 25, in pipe_classify self.pdf_type = AbsPipe.classify(self.pdf_bytes) │ │ │ │ │ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec... │ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000029BBE2B9510> │ │ │ └ <staticmethod(<function AbsPipe.classify at 0x0000029B9BD66170>)> │ │ └ <class 'magic_pdf.pipe.AbsPipe.AbsPipe'> │ └ '' └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000029BBE2B9510>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\AbsPipe.py", line 63, in classify pdf_meta = pdf_meta_scan(pdf_bytes) │ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec... └ <function pdf_meta_scan at 0x0000029B9BD65630>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 337, in pdf_meta_scan text_language = get_language(doc) │ └ Document('', <memory, doc# 1>) └ <function get_language at 0x0000029B9BD65510>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 289, in get_language page_language = detect_lang(text_block) │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... └ <function detect_lang at 0x0000029B9965F910>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 23, in detect_lang lang_upper = detect_language(html_no_ctrl_chars) │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful... └ <function detect_language at 0x0000029B9965FEB0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() │ │ └ True │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful... └ <function detect at 0x0000029B99974B80>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect labels, scores = model.predict(text) │ │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful... │ └ <function _FastText.predict at 0x0000029B9967CEE0> └ <fasttext.FastText._FastText object at 0x0000029BBE2B9D50>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 228, in predict
return labels, np.array(probs, copy=False)
│ │ │ └ (0.9080705046653748,)
│ │ └
ValueError: Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
请各位解答疑惑
How to reproduce the bug | 如何复现
如报错描述所示
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.6.x
Device mode | 设备模式
cpu
项目不兼容numpy2.x,需要安装1.x版本,正常安装项目会自动处理依赖版本,请按readme执行操作。
我这边就是按照readme的步骤来安装的,在解决了fairscale模块的问题后,又出现了缺少fvcore.transforms模块的问题,然后通过conda安装了fvcore之后,就出现了上述这个ValueError的问题,请问有什么解决方法吗
我刚刚检查了在MinerU环境下的numpy版本为1.26.4,并非2.x版本,仍然出现上述报错
我这边就是按照readme的步骤来安装的,在解决了fairscale模块的问题后,又出现了缺少fvcore.transforms模块的问题,然后通过conda安装了fvcore之后,就出现了上述这个ValueError的问题,请问有什么解决方法吗
正常安装流程不应该缺少这么多依赖,而且十分不建议使用conda安装任何依赖,项目所有依赖都应该通过pip安装
我刚刚检查了在MinerU环境下的numpy版本为1.26.4,并非2.x版本,仍然出现上述报错
上述报错的原因很明确是由于numpy2.x导致的,1.26.4不会触发这个问题
(base) PS C:\Users\dengg> conda activate MinerU (MinerU) PS C:\Users\dengg> conda list numpy
packages in environment at D:\Anaconda3\envs\MinerU:
Name Version Build Channel
numpy 1.26.4 pypi_0 pypi
(MinerU) PS C:\Users\dengg> magic-pdf -p C:\Users\dengg\Desktop\test -o C:\Users\dengg\Desktop\test_out -m auto
2024-08-13 17:19:38.772 | ERROR | magic_pdf.tools.cli:parse_doc:69 - Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
Traceback (most recent call last):
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 20, in detect_lang lang_upper = detect_language(text) │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... └ <function detect_language at 0x0000020F706FFEB0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() │ │ └ True │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... └ <function detect at 0x0000020F70A20B80>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect labels, scores = model.predict(text) │ │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... │ └ <function _FastText.predict at 0x0000020F7071CEE0> └ <fasttext.FastText._FastText object at 0x0000020F1545DCF0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 221, in predict
text = check(text)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function _FastText.predict.
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 208, in check raise ValueError(
ValueError: predict processes one line at a time (remove '\n')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
│ │ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
│ └ <code object
File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
│ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
└ <code object
File "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main_.py", line 7, in
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) │ │ │ └ {} │ │ └ () │ └ <function BaseCommand.main at 0x0000020F6E12FAC0> └ <Command cli>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) │ │ └ <click.core.Context object at 0x0000020F6C5787F0> │ └ <function Command.invoke at 0x0000020F6E1445E0> └ <Command cli>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) │ │ │ │ │ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'} │ │ │ │ └ <click.core.Context object at 0x0000020F6C5787F0> │ │ │ └ <function cli at 0x0000020F1546F490> │ │ └ <Command cli> │ └ <function Context.invoke at 0x0000020F6E12F2E0> └ <click.core.Context object at 0x0000020F6C5787F0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 783, in invoke return __callback(*args, **kwargs) │ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'} └ ()
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 73, in cli
parse_doc(doc_path)
│ └ WindowsPath('C:/Users/dengg/Desktop/test/Optical characteristics and energy transfer analysis of Dy3+-Pr3+ ions doped in CeF3...
└ <function cli.
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 60, in parse_doc do_parse( └ <function do_parse at 0x0000020F1546E8C0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\common.py", line 61, in do_parse pipe.pipe_classify() │ └ <function UNIPipe.pipe_classify at 0x0000020F1546E5F0> └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000020F1545D4B0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 25, in pipe_classify self.pdf_type = AbsPipe.classify(self.pdf_bytes) │ │ │ │ │ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec... │ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000020F1545D4B0> │ │ │ └ <staticmethod(<function AbsPipe.classify at 0x0000020F72E16170>)> │ │ └ <class 'magic_pdf.pipe.AbsPipe.AbsPipe'> │ └ '' └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000020F1545D4B0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\AbsPipe.py", line 63, in classify pdf_meta = pdf_meta_scan(pdf_bytes) │ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec... └ <function pdf_meta_scan at 0x0000020F72E15630>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 337, in pdf_meta_scan text_language = get_language(doc) │ └ Document('', <memory, doc# 1>) └ <function get_language at 0x0000020F72E15510>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 289, in get_language page_language = detect_lang(text_block) │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved... └ <function detect_lang at 0x0000020F706FF910>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 23, in detect_lang lang_upper = detect_language(html_no_ctrl_chars) │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful... └ <function detect_language at 0x0000020F706FFEB0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language lang_code = detect(sentence, low_memory=low_memory).get("lang").upper() │ │ └ True │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful... └ <function detect at 0x0000020F70A20B80>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect labels, scores = model.predict(text) │ │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful... │ └ <function _FastText.predict at 0x0000020F7071CEE0> └ <fasttext.FastText._FastText object at 0x0000020F1545DCF0>
File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 228, in predict
return labels, np.array(probs, copy=False)
│ │ │ └ (0.9080705046653748,)
│ │ └
ValueError: Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
以上是我刚刚测试的结果,您这边可以看一下,numpy版本确实是1.26.4
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x). For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
报错信息里应该很清楚,1.x的numpy是输出不了这个的
但是我这环境里面显示的numpy版本显示是1.26.4,想问一下是有什么可能的原因呢
看numpy的版本应该使用pip list 而不是conda list吧
我这使用pip list看numpy也是1.26.4版本的
要不你建个新的conda环境从头走一遍再试试?
好的我再尝试一下吧,有问题再来咨询您
找到 FastText.py 文件的 predict 方法的实现部分,找到这段代码: return labels, np.array(probs, copy=False) 将其替换为: return labels, np.asarray(probs)
找到 FastText.py 文件的 predict 方法的实现部分,找到这段代码: return labels, np.array(probs, copy=False) 将其替换为: return labels, np.asarray(probs)
That is right!