ragflow [Bug]: Ragflow would NOT create the es index if the document no parsing and throw error: no chunk found

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

09f8dfe456a13849cfde06d0982e37c18a808c5b

RAGFlow image version

v0.19.x

Other environment information

os: macOS
hardware: mbp M1 PRO

Actual behavior

I've created an empty file document in my workspace, and try to click into the chunks, when I expected to see is that the chunks return an empty page without any errors. However, the UI page returned "No chunk found!" and also seen the es error:

2025-06-12 00:26:08 2025-06-11 20:26:08,902 INFO 18 POST http://es-bigdata-ragflow.internal:80/ragflow_2fd2c03046b111f080d2966d4d3e1e42/_search [status:404 duration:0.014s]
2025-06-12 00:26:08 File "/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/__init__.py", line 3836, in search
2025-06-12 00:26:08 File "/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/utils.py", line 446, in wrapped
2025-06-12 00:26:08 File "/ragflow/rag/utils/es_conn.py", line 242, in search
2025-06-12 00:26:08 2025-06-11 20:26:08,902 ERROR 18 ESConnection.search ['ragflow_2fd2c03046b111f080d2966d4d3e1e42'] query: {'query': {'bool': {'filter': [{'term': {'doc_id': 'abd3503c470111f0ab76966d4d3e1e42'}}, {'terms': {'kb_id': ['7e5cba7046fd11f0ae36966d4d3e1e42']}}]}}, 'from': 0, 'size': 1000, 'timeout': '600s', 'track_total_hits': True}
2025-06-12 00:26:08 raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
2025-06-12 00:26:08 res = self.es.search(index=indexNames,
2025-06-12 00:26:08 elasticsearch.NotFoundError: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_2fd2c03046b111f080d2966d4d3e1e42]', ragflow_2fd2c03046b111f080d2966d4d3e1e42, index_or_alias)
2025-06-12 00:26:08 return self.perform_request( # type: ignore[return-value]
2025-06-12 00:26:08 Traceback (most recent call last):
2025-06-12 00:26:08 File "/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 320, in perform_request
2025-06-12 00:26:08 return api(*args, **kwargs)
2025-06-12 00:26:09 [INFO] 127.0.0.1:40841 - 42707

When I tried to create another kb and uploaded a document and parse it, the issue has gone.

I think it's caused by we didn't init the es index at the document create stage, instead, we did this at the task executor or parse job.

Expected behavior

I think we should support to init the es index once a kb created rather than doing this at parse / task job stage, for example, one of our use cases are just writing the processed chunks data into ragflow kb rather than using the ragflow parsing capability, we just want to use another features of ragflow, e.g. retrieval related features.

Steps to reproduce

As I mentioned before, just create an empty document and no parse, click into the chunks page and you'll see the error.

Additional information

No response

Jun 12 '25 08:06 Colstuwjx

You're correct—currently, RAGFlow only creates the Elasticsearch index when a document is parsed or chunks are inserted, not at document creation. This is why you see a 404/index not found error when trying to access chunks for an unparsed document.

The main reason for this design is that Elasticsearch requires the vector dimension (for dense_vector fields) to be known at index creation, and RAGFlow determines this dimension from the embedding model used during parsing. The mapping config only supports a fixed set of dimensions (512, 768, 1024, 1536), and these can't be changed after the index is created. There is no mechanism in the codebase to update the mapping or add new vector sizes after index creation, so the dimension must be set up front and is enforced strictly by the backend and RAGFlow's logic [source].

If you want to ensure the index exists as soon as a knowledge base or document is created (even before parsing), you would need to trigger index creation at that point. However, you'd have to decide on a default vector dimension (or require the user to specify it), since it can't be changed later. This could be done by modifying the document creation logic (for example, in DocumentService.insert) to call ESConnection.createIdx with a default or user-supplied dimension [source].

If you go this route, be aware that:

The index will be created even if no chunks are ever inserted.
If the embedding model used later has a different dimension, you'll get a mapping error and need to recreate the index.
There's no support for updating the vector dimension after creation.

In summary: the current behavior is by design due to Elasticsearch's mapping constraints. To change it, you'd need to trigger index creation earlier and decide how to handle the vector dimension up front. If you want a code pointer or example for where to add this logic, let me know!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 12 '25 08:06 dosubot[bot]

The KB has not been initiated. Upload and parse an sample file to that KB.

Jun 13 '25 07:06 KevinHuSh