ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: The code block in the PDF is identified as a table.

Open chenhanbiao opened this issue 10 months ago • 1 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

RAGFlow workspace code commit ID

null

RAGFlow image version

newest

Other environment information

Chunk Method:General

Actual behavior

Original image: Identification result:

<tr><td  >以json输出格式为例,示范不同JMESPath表达式的数据输出顺序:</td></tr>
<tr><td  >示例1:</td></tr>
<tr><td  >指定至对象层级,KooCLI会以该对象各属性名的字典顺序,输出其对应的值。在</td></tr>
<tr><td  >此示例中指定至对象"items[0]",该对象中各属性按字典顺序排序后为:</td></tr>
<tr><td  >apiVersion,kind,metadata,spec,status。因此输出结果如下:</td></tr>
<tr><td  >hcloud CCE ListClusters --cli-region="ap-southeast-1" --type="VirtualMachine" --  projectid*dd8*c****9b5a846-*li-quy*it*s0]"</td></tr>
<tr><td  >apiVersion": "v3",</td></tr>
<tr><td  >"Kkind": "Cluster",</td></tr>
<tr><td  >"metadata": </td></tr>
<tr><td  >"creationTimestamp": "2022-05-13 08:51:58.252509 +0000 UTC",</td></tr>
<tr><td  >"labels": </td></tr>
<tr><td  >"FeatureGates": "elbv3.,"</td></tr>
<tr><td  >1</td></tr>
<tr><td  >"name*githu.*****"</td></tr>
<tr><td  >"uid*+*****************101534",</td></tr>
<tr><td  >"updateTimestamp": "2022-05-13 09:10:06.395875 +0000 UTC"</td></tr>
<tr><td  >了</td></tr>
<tr><td  >"spec": {</td></tr>
<tr><td  >"authentication": {</td></tr>
<tr><td  >"authenticatingProxy": 0,</td></tr>
<tr><td  >"mode": "rbac'"</td></tr>
<tr><td  >"az": "multi_az",</td></tr>
<tr><td  >"billingMode": 0, </td></tr>
<tr><td  >"category": "CCE",</td></tr>
<tr><td  >"containerNetwork": {</td></tr>
<tr><td  >41</td></tr>
</table>

Expected behavior

No response

Steps to reproduce

Identifying PDFs with Code Blocks Using the General Chunk Method

Additional information

No response

chenhanbiao avatar Feb 27 '25 06:02 chenhanbiao

We're gona improve that.

KevinHuSh avatar Feb 28 '25 04:02 KevinHuSh