ragflow
ragflow copied to clipboard
[Bug]: The code block in the PDF is identified as a table.
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
RAGFlow workspace code commit ID
null
RAGFlow image version
newest
Other environment information
Chunk Method:General
Actual behavior
Original image: Identification result:
<tr><td >以json输出格式为例,示范不同JMESPath表达式的数据输出顺序:</td></tr>
<tr><td >示例1:</td></tr>
<tr><td >指定至对象层级,KooCLI会以该对象各属性名的字典顺序,输出其对应的值。在</td></tr>
<tr><td >此示例中指定至对象"items[0]",该对象中各属性按字典顺序排序后为:</td></tr>
<tr><td >apiVersion,kind,metadata,spec,status。因此输出结果如下:</td></tr>
<tr><td >hcloud CCE ListClusters --cli-region="ap-southeast-1" --type="VirtualMachine" -- projectid*dd8*c****9b5a846-*li-quy*it*s0]"</td></tr>
<tr><td >apiVersion": "v3",</td></tr>
<tr><td >"Kkind": "Cluster",</td></tr>
<tr><td >"metadata": </td></tr>
<tr><td >"creationTimestamp": "2022-05-13 08:51:58.252509 +0000 UTC",</td></tr>
<tr><td >"labels": </td></tr>
<tr><td >"FeatureGates": "elbv3.,"</td></tr>
<tr><td >1</td></tr>
<tr><td >"name*githu.*****"</td></tr>
<tr><td >"uid*+*****************101534",</td></tr>
<tr><td >"updateTimestamp": "2022-05-13 09:10:06.395875 +0000 UTC"</td></tr>
<tr><td >了</td></tr>
<tr><td >"spec": {</td></tr>
<tr><td >"authentication": {</td></tr>
<tr><td >"authenticatingProxy": 0,</td></tr>
<tr><td >"mode": "rbac'"</td></tr>
<tr><td >"az": "multi_az",</td></tr>
<tr><td >"billingMode": 0, </td></tr>
<tr><td >"category": "CCE",</td></tr>
<tr><td >"containerNetwork": {</td></tr>
<tr><td >41</td></tr>
</table>
Expected behavior
No response
Steps to reproduce
Identifying PDFs with Code Blocks Using the General Chunk Method
Additional information
No response
We're gona improve that.