表格识别报错
Description of the bug | 错误描述
使用表格识别功能后报错: Traceback (most recent call last):
File "D:\wzh\MinerU-master\demo\magic_pdf_parse_main.py", line 136, in
File "D:\wzh\MinerU-master\demo\magic_pdf_parse_main.py", line 121, in pdf_parse_main content_list = pipe.pipe_mk_uni_format(image_path_parent, drop_mode="none") │ │ └ 'images' │ └ <function UNIPipe.pipe_mk_uni_format at 0x000001A5FD019EA0> └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001A5FCFFD390>
File "D:\wzh\MinerU-master\magic_pdf\pipe\UNIPipe.py", line 42, in pipe_mk_uni_format result = super().pipe_mk_uni_format(img_parent_path, drop_mode) │ └ 'none' └ 'images'
File "D:\wzh\MinerU-master\magic_pdf\pipe\AbsPipe.py", line 51, in pipe_mk_uni_format content_list = AbsPipe.mk_uni_format(self.get_compress_pdf_mid_data(), img_parent_path, drop_mode) │ │ │ │ │ └ 'none' │ │ │ │ └ 'images' │ │ │ └ <function AbsPipe.get_compress_pdf_mid_data at 0x000001A5E2D21510> │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001A5FCFFD390> │ └ <staticmethod(<function AbsPipe.mk_uni_format at 0x000001A5E2D21900>)> └ <class 'magic_pdf.pipe.AbsPipe.AbsPipe'>
File "D:\wzh\MinerU-master\magic_pdf\pipe\AbsPipe.py", line 94, in mk_uni_format content_list = union_make(pdf_info_list, MakeMode.STANDARD_FORMAT, drop_mode, img_buket_path) │ │ │ │ │ └ 'images' │ │ │ │ └ 'none' │ │ │ └ 'standard_format' │ │ └ <class 'magic_pdf.libs.MakeContentConfig.MakeMode'> │ └ [{'preproc_blocks': [{'type': 'title', 'bbox': [170, 131, 373, 155], 'lines': [{'bbox': [171.60202026367188, 134.119171142578... └ <function union_make at 0x000001A5E088D000>
File "D:\wzh\MinerU-master\magic_pdf\dict2md\ocr_mkcontent.py", line 371, in union_make para_content = para_to_standard_format_v2(para_block, img_buket_path, page_idx) │ │ │ └ 0 │ │ └ 'images' │ └ {'type': 'table', 'bbox': [55, 500, 487, 551], 'blocks': [{'bbox': [55, 515, 487, 551], 'type': 'table_body', 'lines': [{'bbo... └ <function para_to_standard_format_v2 at 0x000001A5E088CDC0>
File "D:\wzh\MinerU-master\magic_pdf\dict2md\ocr_mkcontent.py", line 258, in para_to_standard_format_v2 para_content['table_body'] = f"\n\n$\n {block['lines'][0]['spans'][0]['content']}\n$\n\n" └ {'type': 'table', 'page_idx': 0}
KeyError: 'content' 1.pdf
How to reproduce the bug | 如何复现
直接运行的magic_pdf_parse_main.py
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.6.x
Device mode | 设备模式
cpu
是今天更新的版本支持表格内容提取了吗
是今天更新的版本支持表格内容提取了吗
是的
@papayalove
是今天更新的版本支持表格内容提取了吗
是的
谢谢!求教一下我安装了0.6.2b1版本,为什么输出的markdown里面表格还是图片形式的呢?我修改magic-pdf.json中的"is_table_recog_enable": true, 也没作用。求助大佬
是今天更新的版本支持表格内容提取了吗
是的
谢谢!求教一下我安装了0.6.2b1版本,为什么输出的markdown里面的表格还是图片形式的呢?我修改magic-pdf.json中的"is_table_recog_enable": true,也没有作用。求助大佬
大佬刚才说了现在还是用不了呢,只能等0.7.x版本了
bug已修复