"failed to parse field [error.grouping_name] of type [keyword] in document" after upgrading to 8.15
Full error:
{
"error": {
"root_cause": [
{
"type": "document_parsing_exception",
"reason": "[1:2013] failed to parse field [error.grouping_name] of type [keyword] in document with id 'OcTp4ZEBkEnqNYvJrysL'. Preview of field's value: '14 UNAVAILABLE: '"
}
],
"type": "document_parsing_exception",
"reason": "[1:2013] failed to parse field [error.grouping_name] of type [keyword] in document with id 'OcTp4ZEBkEnqNYvJrysL'. Preview of field's value: '14 UNAVAILABLE: '",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot index data directly into a field with a [script] parameter"
}
},
"status": 400
}
APM Server version (apm-server version): Upgrade from pre-8.15 to 8.15+
Description of the problem including expected versus actual behavior: indexing errors affects apm-server upgraded from pre-8.15 to 8.15+, where reroute processor is used in logs-apm.error@custom.
Steps to reproduce:
Here's a minimal reproducible example without actually upgrading
PUT _index_template/logs-foo
{
"data_stream": {},
"index_patterns": ["logs-foo-*"],
"composed_of": ["logs-foo@mappings"],
"ignore_missing_component_templates": ["logs-foo@mappings"],
"template": {
"settings": {
"index.default_pipeline": "logs-foo@old-pipeline"
}
},
"priority": 250
}
PUT _component_template/logs-foo@mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"foo": {
"type": "keyword"
}
}
}
}
}
PUT _ingest/pipeline/logs-foo@old-pipeline
{
"processors": [
{
"set": {
"field": "foo",
"value": "old"
}
},
{
"pipeline": {
"name": "logs-foo@pipeline"
}
}
]
}
PUT _ingest/pipeline/logs-foo@pipeline
{
"processors": [
]
}
POST logs-foo-default/_doc
{
"@timestamp": "1970-01-01T00:00:00.000Z"
}
GET logs-foo-*/_search
{
"fields": ["foo"]
}
PUT _index_template/logs-foo
{
"data_stream": {},
"index_patterns": ["logs-foo-*"],
"composed_of": ["logs-foo@mappings"],
"ignore_missing_component_templates": ["logs-foo@mappings"],
"template": {
"settings": {
"index.default_pipeline": "logs-foo@pipeline"
}
},
"priority": 250
}
PUT _component_template/logs-foo@mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"foo": {
"type": "keyword",
"script": {
"source": "emit(\"new\")",
"lang": "painless"
}
}
}
}
}
}
PUT _ingest/pipeline/logs-foo@pipeline
{
"processors": [
{"reroute": {
"namespace": ["mynamespace"]
}}
]
}
POST logs-foo-default/_doc
{
"@timestamp": "1970-01-01T00:00:00.000Z"
}
Error from this example:
{
"error": {
"root_cause": [
{
"type": "document_parsing_exception",
"reason": "[1:120] failed to parse field [foo] of type [keyword] in document with id 'kWE245EB4mZ7zYVt-HD2'. Preview of field's value: 'old'"
}
],
"type": "document_parsing_exception",
"reason": "[1:120] failed to parse field [foo] of type [keyword] in document with id 'kWE245EB4mZ7zYVt-HD2'. Preview of field's value: 'old'",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot index data directly into a field with a [script] parameter"
}
},
"status": 400
}
The reason behind the bug, e.g. upgrading from 8.13.2 to 8.15.0
- apm-server always (unless
apm-server.data_streams.namespaceis set in apm-server.yml) issues the bulk request tologs-apm.error-default. It will be subject to ingest pipeline of current backing index of DSlogs-apm.error-default. - 8.13.2 uses ingest pipeline to set
error.grouping_name, while 8.15.0 sets it within a script in the mapping. - 8.13.2 uses ingest pipeline
logs-apm.error-8.13.2, while 8.15.0 uses ingest pipelinelogs-apm.error@default-pipeline. Both ingest pipeline referenceslogs-apm.error@customwhere the reroute processor is placed. - Let's say reroute processor creates
logs-apm.error-some_custom_namespace - after 8.15 upgrade,
logs-apm.error-defaultis not rolled over whilelogs-apm.error-some_custom_namespaceis.
The state of the system after the above:
- Current backing index of
logs-apm.error-defaultis from 8.13.2, which setserror.grouping_name, then callslogs-apm.error@customwhich performs reroute. - The event is reroute to
logs-apm.error-some_custom_namespace, where backing index has the mapping from 8.15.0, where it setserror.grouping_nameusing a script in mapping (not ingest pipeline). - It is forbidden to use a script in mapping for
error.grouping_name, when the _source already containserror.grouping_name, from the 8.13.2 ingest pipeline. Therefore, indexing fails with reason"Cannot index data directly into a field with a [script] parameter".
after 8.15 upgrade, logs-apm.error-default is not rolled over while logs-apm.error-some_custom_namespace.
That seems to be the crux of the issue. I suspect the issue is that we're relying on lazy rollover, which I suppose does not trigger when rerouting. I'll check on this.
Confirmed above hypothesis:
- Create an index template which sets a default ingest pipeline with reroute
- Create a data stream matching the index template
- Send a document to the data stream; it will be rerouted
- Create another index template with higher priority with the same index pattern, with no default ingest pipeline
- Rollover the source data stream with "lazy=true"
- Send a document to the data stream; it will still be rerouted
- Rollover the source data stream with "lazy=false"
- Send a document to the data stream; it will not be rerouted
https://github.com/elastic/elasticsearch/issues/112781
With https://github.com/elastic/elasticsearch/issues/112781 being fixed, is there anything else that needs to be implemented directly in the APM Server or does this just require some testing?
We should test it, but I don't think there's anything more to implement for this.
moving this from it105 to it106
Given that 8.15 is now not supported anymore, what should still be tested is that this kind of error doesn't happen anymore once a deployment is upgraded to > 8.15. I suggest to run a quick manual test with
(1) recreating the problem shared in the description
(2) upgrade to 8.17.last, ingest data and confirm that the error is not occuring anymore
For step (1), likely 8.15.0 or 8.15.1 would be the best versions for reproducing the issue as several issues were fixed in the last minor versions of 8.15. (reach out if you need help reproducing).
v8.15.0
I do get the same error as @carsonip outlined:
{
"error": {
"root_cause": [
{
"type": "document_parsing_exception",
"reason": "[1:120] failed to parse field [foo] of type [keyword] in document with id 'NLh7JZUB20ThadfQMTAd'. Preview of field's value: 'old'"
}
],
"type": "document_parsing_exception",
"reason": "[1:120] failed to parse field [foo] of type [keyword] in document with id 'NLh7JZUB20ThadfQMTAd'. Preview of field's value: 'old'",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot index data directly into a field with a [script] parameter"
}
},
"status": 400
}
v8.15.2
There is no available v8.15.1 on GCP. I also check the region that contains the SNAPSHOTS. Give the same issue
{
"error": {
"root_cause": [
{
"type": "document_parsing_exception",
"reason": "[1:120] failed to parse field [foo] of type [keyword] in document with id '_n-AJZUB4qKmXkI0gAWF'. Preview of field's value: 'old'"
}
],
"type": "document_parsing_exception",
"reason": "[1:120] failed to parse field [foo] of type [keyword] in document with id '_n-AJZUB4qKmXkI0gAWF'. Preview of field's value: 'old'",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot index data directly into a field with a [script] parameter"
}
},
"status": 400
}
v8.17.2
Issue seems to be resolved
{
"_index": ".ds-logs-foo-default-2025.02.20-000001",
"_id": "yqKBJZUBAXHl19DUaCaE",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
I'll go ahead and close this.
@rubvs just to confirm - you tested with upgrading a deployment from 8.15 to 8.17.2, not creating a new deployment on 8.17.2, correct?
Thanks for double checking with me @simitt. Yes, let me outline my steps. I double checked it by also testing Andrew's method:
Test 1: Carson Approach
- Start with v8.15.0
- Run the steps outlined in this PR desc
- Verify that the error does occur when trying to reroute:
PUT _ingest/pipeline/logs-foo@pipeline
{
"processors": [
{"reroute": {
"namespace": ["mynamespace"]
}}
]
}
POST logs-foo-default/_doc
{
"@timestamp": "1970-01-01T00:00:00.000Z"
}
- Manually upgrade to 8.17.2 via Kibana
- Re-add the doc and ensure it is indeed rerouted to the correct namespace.
# 1. Here, we are adding a doc to "default" namespace
POST logs-foo-default/_doc
{
"@timestamp": "1970-01-01T00:00:00.000Z"
}
# 2. But the doc is successfully reroute to "mynamespace" namespace
{
"_index": ".ds-logs-foo-mynamespace-2025.02.22-000001",
"_id": "54rTL5UBhd3e6_WlhwSD",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 2
}
- That is to say, upgrading from 8.15.0 to 8.17.2 solved the issue
- Repeat starting with 8.15.2 & 8.16.4
- In 8.16.4, the issue is fixed WITHOUT needing an upgrade.
Test 2: Andrew Approach
- Start with v8.15.0
- Run all steps outlined my Andrew's approach in PR
- Verify that Step 6 does in fact STILL reroute to the namespace (
fooin this case).- ".ds-demo-dataset-foo-2025.02.22-000002"
- Manually upgrade to 8.17.2 via Kibana
- Rerun Step 6 to verify the doc is NOT rerouted.
- ".ds-demo-dataset-default-2025.02.22-000002"
- That is to say, upgrading from 8.15.0 to 8.17.2 solved the issue
- Repeat starting with 8.15.2 & 8.16.4
- In 8.16.4, the issue is fixed WITHOUT needing an upgrade.
Thanks for confirmation and providing the testing details!
Hi, I'm working on potentially adding a test case for this in functional tests: https://github.com/elastic/apm-server/issues/14061. May I know what would be the reproducible steps of the original error? Simply create reroute processor in logs-apm.error@custom and upgrade, then check ingest?
@ericywl pretty much, yes. High level, the problem was that in the impacted versions, there was no automatic rollover triggered on update due to the lazy rollover. With lazy rollover usually the rollover happens on first write request, but this wasn't triggered with the reroute processor, therefore no rollover happened and new settings/mappings weren't picked up.