ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Table parsing got 31 chunks (one chunk = one entry/issue) but chat answer can only display 7 entries for the whold table

Open Bob123Yang opened this issue 7 months ago • 10 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

  1. Knowledge base created successfully. a. set embedding model as bge-m3 b. set chunk method = table c. upload one xlsx format file to ragflow with 31 rows and each row represents one entry/issue with only two column: column A is issue ID and column B is issue summary (very simple table). d. the parsing result for this xlsx file as below and you can see each row is one chunk and total 31 rows were 31 chunks and I check each chunk is showing the correct issue.

  2. But the chat result is frustrated that only show 7 entries after searching the whole table with a big gap from the file parsing result - 31 chunks. (chat model is qwen3 32b)

could you tell me how to adjust the setting for knowledge base or the chat assistant to get the correct issue list since the file parsing result is already totally correct - 31 chunks representing 31 issues? Thanks.

Image

Image

Image

Bob123Yang avatar May 18 '25 02:05 Bob123Yang

I found a similar resolved issue where the chat did not display all the information from the knowledge base. The suggested solution was to check the content sent to the LLM by clicking the little light bulb icon. If the relevant information is sent to the LLM, switching to another LLM might resolve the issue [1].

Regarding your current situation, the discrepancy between the number of chunks parsed (31) and the number displayed in the chat (7) could be influenced by several factors:

  1. Top N Setting: The default setting for "Top N" in RAGFlow limits the number of chunks fed to the LLM. You might want to increase this setting to ensure more chunks are considered during retrieval. The default value is 8, which might be why only 7 entries are displayed [2].

  2. Similarity Threshold: Ensure that the similarity threshold is set appropriately. If the threshold is too high, it might filter out chunks that are relevant but have lower similarity scores. The default threshold is 0.2 [2].

  3. Keyword Similarity Weight: Adjusting the keyword similarity weight might help in retrieving more relevant chunks. The default weight is 0.7, which influences the hybrid score system used for evaluating chunk relevance [2].

Consider reviewing these settings in your knowledge base configuration to improve the retrieval and display of chunks in the chat interface.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 18 '25 02:05 dosubot[bot]

@dosu

Thank you and I got a step onward that chat assistant could return 30 entries after I change the Top N Setting to its max value - 30. But in fact I have 31 trunks for 31 entries as I mentioned in my first question, so how can I let the chat assistant return all 31 entries? Or in anther word, how can we feed more chunks (more than 30) to the LLM? However it's so easy for some complex and long document to split into more than 31 chunks which all should be parsed by LLM at the same time.

Maybe we should resolve this problem - "fed all chunks out of a complex and long document to the LLM at the same time " in two way:

  1. Increase the chunk number fed to the LLM to a more larger number.
  2. Since the chunk number fed to the LLM must be a fix value, a infinite way to add the increasing new chunks in the LLM.

Thanks a lot!

Bob123Yang avatar May 18 '25 11:05 Bob123Yang

The maximum value for the 'Top N' setting in RAGFlow is 1000, and it can be adjusted by modifying the max parameter in the configuration. You can increase this limit to accommodate more chunks by changing the configuration as follows:

<TopNItem initialValue={30} max={1000}></TopNItem>

This will allow you to set a higher 'Top N' value, potentially up to 1000, which should help in feeding more chunks to the LLM [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 18 '25 11:05 dosubot[bot]

@dosu

which configuration file can I modify with the below change: <TopNItem initialValue={30} max={1000}></TopNItem>

Furthermore, if my xlsx document has about 1500 entries that is more than 1000, how can I resolve it?

Thanks.

Bob123Yang avatar May 19 '25 03:05 Bob123Yang

The 'Top N' setting in RAGFlow is defined as the "top_n" attribute within the "prompt" object. You can modify this setting by adjusting the "top_n" value in the JSON object for the prompt configuration [1][2][3].

For handling datasets with more than 1000 entries, you can use the auto-tagging feature. This feature automatically maps tags from user-defined tag sets to relevant chunks within your knowledge base based on similarity. It allows you to apply an additional layer of domain-specific knowledge to existing datasets, which is particularly useful when dealing with a large number of chunks. To use this feature, configure tag sets on the Configuration page of your knowledge base and re-parse your documents to initiate the auto-tagging process [4].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 19 '25 03:05 dosubot[bot]

@dosu

which configuration file can I modify with the below change:

TopNItem initialValue={30} max={1000}

Bob123Yang avatar May 19 '25 03:05 Bob123Yang

To change the TopNItem initial value to 30 and max to 1000, you should modify the configuration in the following files:

  1. web/src/pages/flow/form/exesql-form/index.tsx file where the TopNItem is defined. The relevant configuration is:

    <TopNItem initialValue={30} max={1000}></TopNItem>
    
  2. If you are working with the AkShareForm, you would need to modify the web/src/pages/flow/form/akshare-form/index.tsx file. The current configuration is:

    <TopNItem initialValue={10} max={99}></TopNItem>
    

    You should change it to:

    <TopNItem initialValue={30} max={1000}></TopNItem>
    
  3. Additionally, if you are working with the WenCaiForm, you should modify the web/src/pages/flow/form/wencai-form/index.tsx file. The current configuration is:

    <TopNItem initialValue={20} max={99}></TopNItem>
    

    You should change it to:

    <TopNItem initialValue={30} max={1000}></TopNItem>
    

These changes will allow you to set a higher 'Top N' value, potentially up to 1000, which should help in feeding more chunks to the LLM [1][2][3][4].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 19 '25 03:05 dosubot[bot]

@dosu

I have modify the file you mentioned as below that make sure each file contain the configuration: TopNItem initialValue={30} max={1000}

test@test-test-Product:~/ragflow$ cat web/src/pages/flow/form/exesql-form/index.tsx | grep initialValue
      <TopNItem initialValue={30} max={1000}></TopNItem>
test@test-test-Product:~/ragflow$ cat web/src/pages/flow/form/akshare-form/index.tsx | grep initialValue
      <TopNItem initialValue={30} max={1000}></TopNItem>
test@test-test-Product:~/ragflow$ cat web/src/pages/flow/form/wencai-form/index.tsx | grep initialValue
      <TopNItem initialValue={30} max={1000}></TopNItem>

But the MAX value of Top N setting in the Chat Assistant is still 30 instead of 1000 even I restart all the ragflow services and login again and create the new Chat Assistant. Anything missing?

Image

Bob123Yang avatar May 20 '25 02:05 Bob123Yang

It seems like the changes you made to the configuration files are not being reflected in the Chat Assistant interface. Here are a few steps you can take to troubleshoot and resolve this issue:

  1. Clear Cache: RAGFlow uses a caching mechanism, possibly with Redis, which might be retaining the old settings. Ensure that you clear the cache to apply the new configuration changes. You can do this by restarting the Redis service or using a command to clear the cache explicitly [1].

  2. Verify Configuration Files: Double-check that the changes have been saved correctly in the configuration files. Ensure there are no syntax errors or issues that might prevent the new settings from being applied.

  3. Environment Variables: Check if there are any environment variables or global settings that might override the configuration changes. These might need to be updated to reflect the new 'Top N' setting.

  4. Service Restart: Ensure that all relevant services are restarted after making configuration changes. Sometimes, a simple restart might not be sufficient, and a full system reboot might be necessary.

  5. Logs and Debugging: Check the logs for any errors or warnings that might indicate why the changes are not being applied. This can provide insights into any underlying issues.

If these steps do not resolve the issue, consider reviewing the documentation or codebase for any additional settings or dependencies that might need to be adjusted.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 20 '25 02:05 dosubot[bot]

I restarted all the RAGFlow related services and create a new Chat Assistant but the Top N Max value is still 30 rather than 1000 in the Chat Assistant.

any one can help?

Bob123Yang avatar May 20 '25 03:05 Bob123Yang

@dosu

I modify the file - ./web/src/components/top-n-item.tsx with two lines changed:

const TopNItem = ({ initialValue = 8, max = 30 }: IProps) => {
export function TopNFormField({ max = 30 }: SimilaritySliderFormFieldProps) {

Change the above two lines as below:

const TopNItem = ({ initialValue = 8, max = 1000 }: IProps) => {
export function TopNFormField({ max = 1000 }: SimilaritySliderFormFieldProps) {

But the Top N in Chat Configuration is still 8 for default and 30 for Max. How to make the max value of Top N in Chat Configuration changed to 1000?

Image

Bob123Yang avatar Jun 26 '25 02:06 Bob123Yang

To make your changes to the Top N max value take effect in the Chat Configuration UI, you need to fully rebuild and redeploy the frontend. Just restarting services is not enough, as the UI is built from static assets that must be regenerated. Here’s what to do:

  1. In the web directory, delete the src/.umi, src/.umi-production, and node_modules folders.
  2. Run npm install to reinstall dependencies.
  3. Run npm run build to rebuild the frontend assets.
  4. Restart the frontend server.
  5. In your browser, do a hard refresh (Ctrl+F5 or Shift+F5) to clear the cache and load the new UI.

This should update the Top N slider to show a max value of 1000 in the Chat Assistant configuration UI. If it still shows 30, double-check that your changes to max=1000 are saved in web/src/components/top-n-item.tsx and that you are editing the correct deployment environment.
Reference

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 26 '25 02:06 dosubot[bot]

Hi @dosu how to restart the frontend server?

Bob123Yang avatar Jun 26 '25 04:06 Bob123Yang

To restart the RAGFlow frontend server after making code changes:

  1. Go to the web directory.
  2. Stop any running frontend process (for example, with pkill npm).
  3. (Optional but recommended after code changes) Run npm install to ensure dependencies are up to date.
  4. Run npm run build to rebuild the frontend assets.
  5. Start the frontend server with npm run dev (docs reference).

Once the server is running, do a hard refresh in your browser (Ctrl+F5 or Shift+F5) to clear the cache and load the updated UI.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 26 '25 04:06 dosubot[bot]

Hi @dosu After I restarted the frontend server with the command "npm run dev", I got the below screen display. I followed the screen display to access RAGFlow http://localhost:9222 successfully and found Top N max value in chat configuration was 1000 that mean my modification has been successful.

Meanwhile I still could access RAGFlow with the original IP http://localhost:80 successfully and found Top N max value in chat configuration was still 30!

I deploy the RAGFlow by docker originally instead of source code and always use http://localhost:80. Today I rebuild frontend just because of the modification of Top N Max that will be only effective in http://localhost:9222.

I don't know the difference between http://localhost:9222 and http://localhost:80, both of which can access the same RAGFlow but with different Top N Max value and which is the right one to use for me?

Image

Bob123Yang avatar Jun 26 '25 08:06 Bob123Yang

The difference is that http://localhost:9222 is your local development server (reflects your latest code changes), while http://localhost:80 is served by your Docker deployment (uses the static frontend assets built into the Docker image). That’s why you see the new Top N max value only on 9222.

To make your changes appear on http://localhost:80, you need to rebuild the Docker image with your updated frontend code and restart the container. Here’s how:

  1. In your project root, run:

    docker build -f Dockerfile.slim -t infiniflow/ragflow:dev-slim .
    

    (Or use Dockerfile instead of Dockerfile.slim if you want the full image with embedding models.)

  2. Then restart your Docker deployment:

    docker compose -f docker/docker-compose.yml up -d
    

Alternatively, you can copy the rebuilt web/dist directory from your local build into the running container and restart it, but rebuilding the image is the recommended approach for production use.

Once you do this, http://localhost:80 will reflect your changes, and you can use it as your main interface. Your data will not be affected, but it’s always a good idea to back up before rebuilding or restarting containers.

References:
How to update Docker deployment after frontend changes
Docker build instructions

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 26 '25 08:06 dosubot[bot]

Thanks @dosu for docker image building is done.

Maybe we need modify the docker/.env file to replace the docker-image name with the new built docker-image name before we restart the docker container using docker-compose-gpu.yml.

  	# Modify the docker/.env file with the new built docker-image name "infiniflow/ragflow:nightly-slim"
  	RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.1-slim ---->>>> RAGFLOW_IMAGE=infiniflow/ragflow:nightly-slim 
    # Restart the docker container using xxx.yml file.
docker-compose -f docker/docker-compose-gpu.yml up -d

@dosu

Now I can set the Top N to the Max value 1000 in the Chat Configuration page and save it. Meanwhile my document uploaded to the Knowledge Base (Note: The document has total 999 lines in which the top line is the head and the left 998 lines represent 998 entries and each line represents one entry.) had been parsed with Table method successfully and got 998 chunks.

I asked the chat assistant to list all entries with the entry ID and Summary in a simple table format, after several minutes thinking, the ragflow only listed about 130 entries rather than 998 entries.

It seems that Top N Max value changed to 1000 from 30 has take effective because the ragflow returned about 130 entries exceeding 30 entries before, but there is still a big gap between 130 and 998 (total chunks is 998 after parsing the document).

Why?

Bob123Yang avatar Jun 30 '25 01:06 Bob123Yang

The main reason you only get about 130 entries (not all 998) is the LLM's context window and token limits. Even with Top N set to 1000, RAGFlow builds the prompt by adding chunks until it reaches about 97% of the model's max token window—anything beyond that is automatically truncated, so large tables will almost never fit in a single response, especially if each entry has more than a few words. This is normal and not a bug—it's a design limitation of LLMs and prompt construction logic, not a hidden cap in RAGFlow itself (source).

You can see exactly what was sent to the LLM by clicking the light bulb icon above the latest chat message—this will show you the prompt and which entries were included (source).

To maximize the number of entries, you can try reducing chunk size (so each entry uses fewer tokens), lowering the similarity threshold, or reformatting your prompt to be more concise. If you need all 998 entries, consider batching your queries (e.g., "show entries 1–200", then "201–400", etc.) or exporting the data outside the chat assistant.

Let me know if you want tips on chunking or batching strategies!

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 30 '25 01:06 dosubot[bot]

Thank you so much @dosu.

I'm interested that you said I can see exactly what was sent to the LLM by clicking the light bulb icon above the latest chat message—this will show you the prompt and which entries were included.

Please help me check whether the prompt I checked is correct as the below picture after I click the light bulb.

Image

Bob123Yang avatar Jun 30 '25 01:06 Bob123Yang

Done

Bob123Yang avatar Jul 01 '25 01:07 Bob123Yang

I have generated the dist folder, and failed to pull the image when rebuilding the Docker image. My version is ragflow:v0.20.1. What command should I use? Can I download the image to my local machine separately and then rebuild the image?@dosu

chord1412077 avatar Sep 30 '25 06:09 chord1412077