ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

a problem while trying to process a search request in Elasticsearch

Open szho42 opened this issue 1 year ago • 3 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe6e21e81f64b1c9199123be6936dbc4a01

Other environment information

This happens on both demo site and a local deployment instance.

Actual behavior

on the page: https://demo.ragflow.io/knowledge/dataset?id=<...> After added a dataset, and try to add text chunks to the dataset via the UI interface, the following error message is encoutered: Possible issue is that the field 'create_time' in your index ragflow_15b4f374f2e011eeae1b0242ac180006 is a text field, and operations like sorting or aggregating require field data. However, field data is disabled by default on text fields to optimize performance.

BadRequestError( "search_phase_execution_exception", meta=ApiResponseMeta( status=400, http_version="1.1", headers={ "X-elastic-product": "Elasticsearch", "content-type": "application/vnd.elasticsearch+json;compatible-with=8", "content-length": "2231", }, duration=0.0018017292022705078, node=NodeConfig( scheme="http", host="es01", port=9200, path_prefix="", headers={ "user-agent": "elasticsearch-py/8.12.1 (Python/3.11.0; elastic-transport/8.12.0)" }, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={}, ), ), body={ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": True, "failed_shards": [ { "shard": 0, "index": "ragflow_15b4f374f2e011eeae1b0242ac180006", "node": "90aM0LzhTSqdYA-X6yX5mg", "reason": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", }, } ], "caused_by": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", "caused_by": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", }, }, }, "status": 400, }, )

Expected behavior

No response

Steps to reproduce

Add a new dataset via the WebUI (successful)
Add a new chunk to the newly created dataset (error).

This happens on both official demo site and a local deployment testing environment.

Additional information

No response

szho42 avatar Apr 08 '24 03:04 szho42

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

This happens on both demo site and a local deployment instance.

Actual behavior

on the page: https://demo.ragflow.io/knowledge/dataset?id=<...> After added a dataset, and try to add text chunks to the dataset via the UI interface, the following error message is encoutered: Possible issue is that the field 'create_time' in your index ragflow_15b4f374f2e011eeae1b0242ac180006 is a text field, and operations like sorting or aggregating require field data. However, field data is disabled by default on text fields to optimize performance.

BadRequestError( "search_phase_execution_exception", meta=ApiResponseMeta( status=400, http_version="1.1", headers={ "X-elastic-product": "Elasticsearch", "content-type": "application/vnd.elasticsearch+json;compatible-with=8", "content-length": "2231", }, duration=0.0018017292022705078, node=NodeConfig( scheme="http", host="es01", port=9200, path_prefix="", headers={ "user-agent": "elasticsearch-py/8.12.1 (Python/3.11.0; elastic-transport/8.12.0)" }, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={}, ), ), body={ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": True, "failed_shards": [ { "shard": 0, "index": "ragflow_15b4f374f2e011eeae1b0242ac180006", "node": "90aM0LzhTSqdYA-X6yX5mg", "reason": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", }, } ], "caused_by": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", "caused_by": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", }, }, }, "status": 400, }, )

Expected behavior

No response

Steps to reproduce

Add a new dataset via the WebUI (successful)
Add a new chunk to the newly created dataset (error).

This happens on both official demo site and a local deployment testing environment.

Additional information

No response

Could you share a sample of your uploaded file?

KevinHuSh avatar Apr 08 '24 14:04 KevinHuSh

basically, adding any random text, e.g. 'abcde' as a chunk to a dataset will have this issue.

Also, the same error happens when

  1. create a new dataset on the Web UI
  2. add a new file to the dataset
  3. try to open the new file for adding text.

szho42 avatar Apr 08 '24 23:04 szho42

For local deployment, make sure the ES is healthy: curl http://ip_to_es:9200 For web site, what about sign up another account?

KevinHuSh avatar Apr 10 '24 02:04 KevinHuSh

for the local deployment, the ES works fine, curl the endpoint gives the following:

{ "name" : "es01", "cluster_name" : "rag_flow", "cluster_uuid" : "v_vnHbEfR2m8u8Fkm6xeKg", "version" : { "number" : "8.11.3", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "64cf052f3b56b1fd4449f5454cb88aca7e739d9a", "build_date" : "2023-12-08T11:33:53.634979452Z", "build_snapshot" : false, "lucene_version" : "9.8.0", "minimum_wire_compatibility_version" : "7.17.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "You Know, for Search" }

On your official demo site, any technical difference if I sign up for a different account?

szho42 avatar Apr 11 '24 02:04 szho42