ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Parsing method table failed to parse xlxs

Open water-2022 opened this issue 1 year ago • 12 comments

Describe your problem

The error is as follows:

Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 146, in build
    cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"],
  File "/ragflow/rag/app/table.py", line 141, in chunk
    dfs = excel_parser(
  File "/ragflow/rag/app/table.py", line 66, in __call__
    res.append(pd.DataFrame(np.array(data), columns=headers))
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 816, in __init__
    mgr = ndarray_to_mgr(
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/internals/construction.py", line 336, in ndarray_to_mgr
    _check_values_indices_shape_match(values, index, columns)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/internals/construction.py", line 420, in _check_values_indices_shape_match
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (0, 1), indices imply (0, 7)

water-2022 avatar Jun 24 '24 06:06 water-2022

Could you attach the file?

KevinHuSh avatar Jun 25 '24 00:06 KevinHuSh

Could you attach the file?

The files are company property.  The Excel file contains an excessive number of sheets, and the table has some merged cells.

water-2022 avatar Jun 26 '24 09:06 water-2022

Could you mock up a sample that will fail.

KevinHuSh avatar Jun 27 '24 01:06 KevinHuSh

Could you mock up a sample that will fail.

@KevinHuSh This is a sample file.

1.xlsx

water-2022 avatar Jul 01 '24 08:07 water-2022

Hello, how do you use the parse module? Is it through command line interface (CLI) or deployment?

ZhichengQian1 avatar Jul 03 '24 09:07 ZhichengQian1

Hello, how do you use the parse module? Is it through command line interface (CLI) or deployment?

You can configure the Chunk method in the portal.

企业微信截图_17200610601470

water-2022 avatar Jul 04 '24 02:07 water-2022

Hello, how do you use the parse module? Is it through command line interface (CLI) or deployment?

You can configure the Chunk method in the portal.

企业微信截图_17200610601470

Thank you, but I just want to utilize the parse module.

ZhichengQian1 avatar Jul 04 '24 02:07 ZhichengQian1

Thank you, but I just want to utilize the parse module.

Maybe you can look at the source code of Ragflow.

water-2022 avatar Jul 05 '24 05:07 water-2022

请问一下,你这个问题解决了吗?我也遇到了这个问题。

lvyoudashuju avatar Sep 14 '24 08:09 lvyoudashuju

@lvyoudashuju We intend to create an international community, so we encourage using English for communication.

@KevinHuSh Any updates on this issue?

JinHai-CN avatar Sep 14 '24 09:09 JinHai-CN

Could you mock up a sample that will fail.

@KevinHuSh This is a sample file.

1.xlsx

This file can be parse smoothly on demo site.

KevinHuSh avatar Sep 14 '24 09:09 KevinHuSh

请问一下,你这个问题解决了吗?我也遇到了这个问题。

Could you elabrate on your issue? Attach the sample file if it's okay for you.

KevinHuSh avatar Sep 14 '24 09:09 KevinHuSh

请问一下,你这个问题解决了吗?我也遇到了这个问题。

Could you elabrate on your issue? Attach the sample file if it's okay for you.

image

lvyoudashuju avatar Sep 18 '24 02:09 lvyoudashuju

Recreate the scenario: Create a new knowledge base, select the table parsing method, and import an Excel file (using 1.xlsx as an example).

I tried it, and it seems that when there are more than two sheets in the Excel file, this issue occurs.

Even with only two sheets, only the first sheet is parsed, and the subsequent sheets are not.

JokerRun avatar Sep 21 '24 14:09 JokerRun

Method 'Table' is for data dumped from DB, so, you'd better parse it by 'General'.

KevinHuSh avatar Sep 23 '24 02:09 KevinHuSh

Recreate the scenario: Create a new knowledge base, select the table parsing method, and import an Excel file (using 1.xlsx as an example).

I tried it, and it seems that when there are more than two sheets in the Excel file, this issue occurs.

Even with only two sheets, only the first sheet is parsed, and the subsequent sheets are not.

It's not exactly what you said. I uploaded an excel file containing 2 sheets and it could be parsed successfully, but when I uploaded another excel file containing 4 sheets, the parsing failed.

lvyoudashuju avatar Sep 23 '24 02:09 lvyoudashuju

As @KevinHuSh said, Method 'Table' is for data dumped from DB, so, you'd better parse it by 'General'.

yuzhichang avatar Nov 27 '24 09:11 yuzhichang