[Question]: parse a excel file by table chunk method
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
i want to parse a excel file by table chunk method,this excel file have several sheet。
finally,can parse the excel smoothly,by how to distinguish the different sheet chunk????
To parse an Excel file using the table chunk method and distinguish between different sheet chunks, you can use the html method from the deepdoc/parser/excel_parser.py file in the ragflow codebase. This method processes each sheet in the workbook separately and creates HTML tables for each chunk of rows. Each table includes a caption with the sheet name, which helps distinguish between different sheet chunks.
Here's a simplified explanation of the code:
def html(self, fnm, chunk_rows=256):
file_like_object = BytesIO(fnm) if not isinstance(fnm, str) else fnm
wb = RAGFlowExcelParser._load_excel_to_workbook(file_like_object)
tb_chunks = []
for sheetname in wb.sheetnames:
ws = wb[sheetname]
rows = list(ws.rows)
if not rows:
continue
tb_rows_0 = "<tr>"
for t in list(rows[0]):
tb_rows_0 += f"<th>{t.value}</th>"
tb_rows_0 += "</tr>"
for chunk_i in range((len(rows) - 1) // chunk_rows + 1):
tb = ""
tb += f"<table><caption>{sheetname}</caption>"
tb += tb_rows_0
for r in list(
rows[1 + chunk_i * chunk_rows: 1 + (chunk_i + 1) * chunk_rows]
):
tb += "<tr>"
for i, c in enumerate(r):
if c.value is None:
tb += "<td></td>"
else:
tb += f"<td>{c.value}</td>"
tb += "</tr>"
tb += "</table>\n"
tb_chunks.append(tb)
return tb_chunks
This method returns a list of HTML table strings, each representing a chunk of rows from a specific sheet, with the sheet name included in the table caption [1].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
The sheet name will be concated to the end of all the chunks if you apply General chunking method.
chunk method use General
The sheet name will be concated to the end of all the chunks if you apply
Generalchunking method.
thx for replying,sir.
two more question:
- what's the function of "table" chunk method? which scene can i use this chunk method?
- if i have several sheet, after parse file by "table" method, how to distinuished the chunk's origin?
chunk method use General
how about “table” chunk method?
what's the function of "table" chunk method? which scene can i use this chunk method?
It's usually for SQL search to structured data.
what's the function of "table" chunk method? which scene can i use this chunk method?
It's usually for SQL search to structured data.
thx sir,one more question~ this is origin excel:
this is result:
how to use the SQL search as you say
If any KB used Table as chunk method, RAGFlow will turn users' questions into ES SQL to query.
So, Table methods usually are used for data dumped from DB. In that way, people do not need to write SQL to query.
If any KB used
Tableas chunk method, RAGFlow will turn users' questions into ES SQL to query. So,Tablemethods usually are used for data dumped from DB. In that way, people do not need to write SQL to query.
ok,sir. where can i find the demo of "RAGFlow will turn users' questions into ES SQL to query."? after parse the xlsx to json block by table method, how to query?
now i build a assistant like this:
how to "turn users' questions into ES SQL to query."?
@sanwei111 In case, there's any KB used Table as chunk method.