elasticsearch Operations to system indices should always use system threadpools

System threadpools are meant to be used for operations for system indices. For an example, thesystem_critical_write should be used for writing to the .security and .security-tokens indices. However, The threadpool switching happens at shard level. At index level, these operations still share the same threadpool as other operations for non-system indices.

For heavy ingestion use cases, e.g. Fleet, if the write threadpool gets saturated, it will lead to 429 rejection errors to system critical writes. For example, when the write threadpool is saturated, users won’t be able to create/invalidate API keys or oauth2 tokens. A sample rejection error is as the follows:

[es_rejected_execution_exception: [es_rejected_execution_exception] Reason: rejected execution of org.elasticsearch.action.bulk.TransportBulkAction$1/org.elasticsearch.action.ActionListener$RunBeforeActionListener/org.elasticsearch.action.ActionListener$DelegatingFailureActionListener/org.elasticsearch.action.support.ContextPreservingActionListener/org.elasticsearch.tasks.TaskManager$1{SafelyWrappedActionListener[listener=WrappedActionListener{org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction$$Lambda$8846/0x00000008020a4f58@50921e21}{org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction$$Lambda$8849/0x00000008020a5378@63f4b3b9}]}{Task{id=1055264, type='transport', action='indices:data/write/bulk', description='requests[1], indices[.security-tokens]', parentTask=unset, startTime=1658399095698, startTimeNanos=21774904411368219}}/org.elasticsearch.xpack.security.action.filter.SecurityActionFilter$$Lambda$6237/0x0000000801d90e58@61953e36/org.elasticsearch.action.bulk.TransportBulkAction$$Lambda$8017/0x0000000801f9d000@26a0d988 on EsThreadPoolExecutor[name = instance-0000000001/write, queue capacity = 10000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6ed3e9a8[Running, pool size = 3, active threads = 3, queued tasks = 10000, completed tasks = 429077]]]

This should not happen because the system critical threapool was introduced to avoid it in the first place. Though the above example is about the .security-tokens index and the system_critical_write threadpool, it is reasonable to believe this issue applies to all system indices and system threadpools.

Aug 03 '22 11:08 ywangd

Pinging @elastic/es-core-infra (Team:Core/Infra)

Aug 03 '22 11:08 elasticsearchmachine

I wonder if the fix in transport bulk action should be conditionally forking based on current thread (i.e don't fork if currently executing in the system_critical_write) with the expectation that the code calling this would fork before calling the transport bulk action. That feels less expensive than iterating through all the bulk requests and comparing them to the known system indices to know to which threadpool to fork to and keeps the forking logic closer to actual usage.

Aug 03 '22 15:08 jakelandis

We talked about this issue at the core/infra sync, and our naive thought was that this was an oversight. Looking at the code, though, I can see why using an ExecutorSelector here would get messy. We already look up indices for the request in a sorted map and find whether they are system indices or not. If we wanted to select between the system_write and system_critical_write thread pools, we would have to take the names of those system indices and look up which thread pool each index is supposed to use, then decide which thread pool to use for the bulk request. If we followed the current behavior, we'd choose the "least critical" thread pool.

If we were forking the thread, that would happen when a system feature creates a bulk transport request directly, right? For example, ApiKeyService#createApiKeyAndIndexIt would fork the thread before calling executeAsyncWithOrigin?

If we did that, then any external request to the bulk endpoint would not be able to use system threadpools, right? I don't know exactly how Fleet behaves, but, for example, if Fleet writes to its system indices with a REST call to the bulk endpoint, would everything would go to the WRITE threadpool?

Aug 04 '22 16:08 williamrandolph

for example, if Fleet writes to its system indices with a REST call to the bulk endpoint

Why would Fleet (or any other external client) write to system indices directly via the normal write path ? Shouldn't any requests that results to writes to system indices go through dedicated endpoints ? A dedicate endpoint would allow the ability to disambiguate (or pre-fork) which threadpool to use.

Aug 04 '22 16:08 jakelandis

Fleet and Kibana both use normal APIs to access their system resources. We call this kind of system index an "external" system index. From the Javadoc on SystemIndexDescriptor.Type:

System indices can also belong to features outside of Elasticsearch that may be part of other Elastic stack components. These are external system indices as the intent is for these to be accessed via normal APIs with a special value.

We can unreliably detect this case, if ThreadContext is passed around correctly: there's a ThreadContext header called _external_system_index_access_origin that contains the product origin, and we could use that to divert to system thread pools. However, this header can be faked in an HTTP request with X-elastic-product-origin.

Aug 04 '22 18:08 williamrandolph

Closing this since it should be fixed by #106150. I do not fully follow the conversation here, but I assume it is because this refers to older versions of the code. Please reopen if you think it is not fully solved by the fix.

Mar 12 '24 07:03 henningandersen

elasticsearch elasticsearch copied to clipboard

Operations to system indices should always use system threadpools

elasticsearch
elasticsearch copied to clipboard