[BUG] PPL cannot properly handle fields with mixed keyword and text data types across multiple indices
What is the bug? PPL cannot properly handle fields with mixed keyword and text data types across multiple indices. Currently, when PPL merges field types, it randomly selects one. For example, if the msg field is mapped as both text and keyword across indices, PPL may select keyword. As a result, when PPL attempts to fetch the msg value from doc_values on each shard, it will return null for documents where msg is defined as a text field, since text fields do not have doc_values.
How can one reproduce the bug?
- Prepare dataset, create 2 indices, msg field type is Text in index1, and keyword in index2.
###
PUT {{baseUrl}}/log00001
Content-Type: application/x-ndjson
{
"mappings": {
"properties": {
"msg": {"type": "text"}
}
}
}
###
PUT {{baseUrl}}/log00002
Content-Type: application/x-ndjson
{
"mappings": {
"properties": {
"msg": {"type": "keyword"}
}
}
}
###
POST {{baseUrl}}/log00001/_bulk
Content-Type: application/x-ndjson
{"index": { "_id": 1 }}
{"msg": "status=200", "index": 1}
###
POST {{baseUrl}}/log00002/_bulk
Content-Type: application/x-ndjson
{"index": { "_id": 1 }}
{"msg": "status=200", "index": 2}
- Run query, we expected get 2 results. but only get 1 results back
### Expected
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson
{
"query": "source=log*"
}
"datarows": [
[
"status=200",
1
],
[
"status=200",
2
]
]
### Unexpected
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson
{
"query": "source=log* | rex field=msg 'status=(?<statusCode>\\d+)' | where statusCode=200"
}
"datarows": [
[
"status=200",
2,
"200"
]
]
What is the expected behavior? Return 2 results
What is your host/environment?
- OS: [e.g. iOS]
- Version [e.g. 22]: Main
- Plugins
Additional Analysis and Information
The issue arises when executing PPL queries against wildcard indices with conflicting field mappings:
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson
{
"query": "source=log* | rex field=msg 'status=(?<statusCode>\\d+)' | where statusCode=200"
}
Expected Result: Multiple matching documents across indicesActual Result: Limited results due to silenced shard failures:
{
"datarows": [
[
"status=200",
2,
"200"
]
]
}
By default, OpenSearch silences partial shard failures and returns available results. To expose the underlying errors, disable partial results:
PUT {{baseUrl}}/_cluster/settings
Content-Type: application/x-ndjson
{
"transient": {
"search.default_allow_partial_results": "false"
}
}
Root Cause Analysis
- Non-Deterministic Mapping Merge
When querying wildcard indices, OpenSearch merges field mappings from all matching indices. The critical issue: when field types conflict, the selection is non-deterministic due to random ordering of the index mapping list. Code Link
- Impact: A field may be treated as int or keyword randomly, depending on which index mapping is processed last
- Calcite Script Engine Optimization Conflict
The Calcite script engine optimizes text field performance by treating them as keywords when multifields are present, accessing doc_values for better performance. Code Link
-
Problem: When the merged mapping indicates keyword type but an index actually has text-only fields, doc_values access fails.
Concrete Example
Index: logs-1
"body": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Index: logs-2
"body": {
"type": "text",
"norms": false
}
The Problem: Even though both indices define body as text, the presence of multifields in logs-1 causes intermittent failures. During mapping merge:
- If logs-1 mapping is selected → Script engine assumes keyword type available
- Script engine attempts doc_values access for performance
- When processing logs-2 documents → Failure occurs (no keyword field/doc_values)
Action Items apart from bug fix
- Run Integration Tests (ITS) with search.default_allow_partial_results=false to uncover any existing partial shard failure scenarios across the codebase
I will look into it.
@penghuo have any conclusion on this? do we need more discussions?
- Short-term solution: Implement a fast-fail mechanism when incompatible field index types are detected (e.g., keyword vs. text).
- Long-term solution:
- Option 1: Enable PPL to execute the query plan on each shard independently. This allows PPL to tolerate incompatible field index types across indices. For example, in an expression like REX(field=msg, 'status=(?<statusCode>\d+)'), PPL can generate a docValues fetch operation on shards where the field type is keyword, and a _source fetch operation on shards where the field type is text.