ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: Incorrect parsing PDF layout

Open James-Dao opened this issue 1 year ago • 8 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch name

main

Commit ID

51efecf

Other environment information

No response

Actual behavior

使用带有表格pdf,或者左右布局的pdf,使用general的方式处理数据。

Expected behavior

No response

Steps to reproduce

使用带有表格pdf,或者左右布局的pdf,使用general的方式处理数据。

Additional information

No response

James-Dao avatar Oct 10 '24 06:10 James-Dao

使用general,同时开启了layout的能力,布局识别经常不准确。

James-Dao avatar Oct 10 '24 06:10 James-Dao

Which version? Online demo or local deployment?

JinHai-CN avatar Oct 10 '24 06:10 JinHai-CN

线上和线下版本都有这个问题。版本是0.11。

James-Dao avatar Oct 10 '24 09:10 James-Dao

0.9版本到0.11都有类似问题。导致pdf中使用layout的数据处理都有数据错位或者丢失的问题。

James-Dao avatar Oct 10 '24 10:10 James-Dao

Lots of reasons will lead to error layout parsing. If possible, would you please attache the error data and screenshot in this issue?

By the way, we intend to create an international community, so we encourage using English for communication.

JinHai-CN avatar Oct 10 '24 11:10 JinHai-CN

image

James-Dao avatar Oct 10 '24 13:10 James-Dao

企业微信截图_893c5ae1-1acf-4808-bcd4-5ce9be04b2fa

James-Dao avatar Oct 10 '24 13:10 James-Dao

anything update?

James-Dao avatar Oct 24 '24 04:10 James-Dao

I have the same issue:

original: image

parsed: image

dassio avatar Dec 17 '24 05:12 dassio

It will be much better for the latest version.

KevinHuSh avatar Dec 17 '24 09:12 KevinHuSh