Grok processor slows down APM ingest pipeline
@joegallo investigated ingest pipeline performance in the wild and found out that this processor accounts for 25% of the ingest pipeline cost:
https://github.com/elastic/apm-server/blob/b75320b4220df7aec8309fa6d62c43c29404ece6/apmpackage/cmd/genpackage/pipelines.go#L69-L78
Is this still needed and if so, can we find an alternative that's not based on grok?
It'd be worth testing this as a dissect processor -- not altogether that different in terms of syntax and use, but I'd expect it to have better performance.
Given the grok --> fail --> remove pattern, though, I bet the very best performance would be a single fail with an annoyingly complex if.
While looking at this, we should consider whether it's worth moving to a separate version field describing the data schema version (see https://github.com/elastic/apm-server/issues/10308#issuecomment-1437724753), rather than the server version. Maybe that way we could avoid string parsing altogether.
Is this still relevant after https://github.com/elastic/elasticsearch/pull/97546/ ? Can we just remove the pipeline once that's merged ?
@kruskall yes, I think we would get rid of the grok processor when we move the templates to ES.
@felixbarny
Hi Felix. Recently Joe G. was on a call with a Customer with whom you had a couple of sessions regarding APM and felt this issue was related to the performance issues and overall APM-related issues they have had. We have shared with Customer this case number. Would like to request to see when this issue could be fixed and or if there are any workarounds. Thanks in advance.
We're currently working on setting up index templates directly in Elasticserach rather than via Fleet for APM (https://github.com/elastic/elasticsearch/pull/97546). After that is done, we'll not use that expensive grok processor anymore.
I don't think there are workarounds for the time being but we're actively working on this at the moment.
We're currently working on setting up index templates directly in Elasticserach rather than via Fleet for APM (elastic/elasticsearch#97546). After that is done, we'll not use that expensive grok processor anymore.
I don't think there are workarounds for the time being but we're actively working on this at the moment.
@felixbarny Thank you for the great news. If possible in which version do we expect the fix? Thanks again
I'll need to defer that question to @simitt and @axw who are not available this week.
We are aiming for the setup to be moved to ES in 8.12, but cannot make any promises on timelines at this point.
The grok processor was removed in https://github.com/elastic/integrations/pull/9185
cc @simitt I believe we can close this