elasticsearch
elasticsearch copied to clipboard
semantic_text ingestion: Change ActionFilter to interface
This is a follow-up to https://github.com/elastic/elasticsearch/pull/108102 created after talking to Armin:
- Avoids using an
ActionFilter
- Creates a new
BulkShardOperationProcessor
interface for processingBulkShardRequests
- A new service
BulkShardOperationService
is included in PluginServices. This service can be used to set aBulkShardOperationProcessor
from the InferenceService. -
BulkShardOperationService
is bound to Guice as the implementation forBulkShardOperationProcessor
. It internally uses theBulkShardOperationProcessor
to process theBulkShardRequest
.
I tried setting the BulkShardOperationProcessor
directly into the TransportService
, but I couldn't:
-
TransportService
is created after pluginscreateComponents()
- The provider approach that is used in
SearchableSnapshots
would not work, as there is no method invoked in the plugin that would allow the provider to be used for this functionality. - I opted for the easy route of having something bound for the
TransportBulkAction
. The service used implements theBulkShardOperationProcessor
for convenience and directly invoking the apply method on it, regardless of whether aBulkShardOperationProcessor
has been registered or not.
@elasticmachine test elasticsearch-ci/packaging-tests-unix-sample
@elasticmachine test elasticsearch-ci/rhel-8
@elasticmachine update branch
Pinging @elastic/es-search (Team:Search)
Can you expand more on what is trying to be achieved here?
Avoids using an ActionFilter
Why do you want to avoid action filters? Specifically it looks like a MappedActionFilter is used already, which shouldn't have performance penalties that normal action filters do.
A new service BulkShardOperationService is included in PluginServices. This service can be used to set a BulkShardOperationProcessor from the InferenceService.
That sounds very hacky. Setting effectively statics late in initialization is something we should avoid since ordering changes can then break expectations about the static being set.
Hey @rjernst ! Thanks for the feedback
Why do you want to avoid action filters? Specifically it looks like a MappedActionFilter is used already, which shouldn't have performance penalties that normal action filters do.
This is the result from a conversation with @original-brownbear. Armin was worried about using an ActionFilter
for clarity. As this is not a cross-cutting concern (like authN/Z), it became a bit difficult to understand where inference was actually being done.
He had concerns about performance as well, but I believe he's at ease with that now.
That sounds very hacky. Setting effectively statics late in initialization is something we should avoid since ordering changes can then break expectations about the static being set.
I understand. We don't expect more processors like these to be added - we could deal with potential ordering / priority if that case arises.
This seemed a more lightweight option for adding a bulk shard request processor from a plugin than doing an actual BulkShardRequestProcessorPlugin
interface. That would probably be the more straightforward way of achieving this, at the expense of creating a new plugin interface that would only be implemented by a single class.
I'm open for ideas here - let's hear as well @original-brownbear opinions on this.
we could deal with potential ordering / priority if that case arises.
Note that I do not mean ActionFilter.order(). I mean order of construction/initialization in NodeConstruction. Things get reordered in there often, it's quite messy.
What is still unclear to me is why this move away from using a SetOnce (as is common in many other parts of ES init) from createComponents is desired. This effectively moves the SetOnce of the processor into a global static, instead of being local to the inference plugin as it is now. But I do not see what that achieves.
What is still unclear to me is why this move away from using a SetOnce (as is common in many other parts of ES init) from createComponents is desired. This effectively moves the SetOnce of the processor into a global static, instead of being local to the inference plugin as it is now. But I do not see what that achieves.
The main idea behind using a different class (BulkShardOperationService
) to hold the BulkShardOperationInferenceProcessor
instance was to be able to provide a default implementation in case the InferencePlugin is not included. That way, we always register the BulkShardOperationService as part of the node construction, and include there the BulkShardOperationInferenceProcessor
from the InferencePlugin
if it's present.
We could provide the BulkShardOperationInferenceProcessor
as a component constructed in InferencePlugin
, but how would we allow for its value to be null
in case the InferencePlugin
is not present? I haven't been able to think of a better alternative.
We could provide the BulkShardOperationInferenceProcessor as a component constructed in InferencePlugin, but how would we allow for its value to be null in case the InferencePlugin is not present?
Perhaps I'm still missing something. Why is the InferencePlugin not being present an issue? In this case the processor would not be present either (at least how it is named, with the InferencedProcessor suffix, seems to imply they are intertwined), right?
Sorry @rjernst , let me summarize the changes done here to understand what are you referring to:
- A new
BulkShardOperationInferenceProcessor
is used to perform the inference instead of using an ActionFilter, for the reasons described - We want to inject it into
TransportBulkAction
so it's eventually available inBulkOperation
. In case InferencePlugin is not present, this would make it a null value that will fail injection. - So I created a
BulkShardOperationService
that acts as a holder for theBulkShardOperationInferenceProcessor
. This is always created on node construction, so it won't fail injection. - The InferencePlugin creates the instance of
BulkShardOperationInferenceProcessor
and injects it intoBulkShardOperationService
to make it available downstream.
I couldn't find another way of injecting a potentially null value in an action, that's why I did the workaround of using a new service so we can use it to inject a processor conditionally.
Is there a better alternative for achieving this? We can also do a quick sync if you prefer.
Thanks!
@rjernst @carlosdelest sorry missed hitting "comment" here. My point about not using an action filter was not all that much about the mechanics of injecting the plugin logic but more about the resulting flow. I think it'd be much better to have this kind of filter on the bulk indexing explicitly in the bulk indexing code instead of having it hidden in an action filter. This is going to be very surprising and hard to debug for people not involved in this effort now down the line. The logic is already very hard to follow with all the code being visible directly, it's going to be infinitely worse if it's not immediately obvious how we get from the raw bulk request to the one with the logic in here applied?
The logic is already very hard to follow with all the code being visible directly
I think this is harder to follow than the existing ActionFilter. The ActionFilter is a known technique. This is setting a global static in a sneaky way.
@rjernst @original-brownbear , I see your points of view. In my mind, the best way going forward would be to have an alternative way of injecting the BulkShardOperationProcessor
in the TransportBulkAction
that does not rely on doing something sneaky like updating a service included in PluginServices
.
The problem, as mentioned, is that the InferencePlugin
might not be registered, so we can't rely on having a non-null BulkShardOperationProcessor
object to inject into the TransportBulkAction
.
For that purpose, I can only think on having a separate BulkActionProcessorPlugin
interface that defines a getBulkShardOperationProcessor()
method, and inject all processors defined in plugins in the NodeConstruction class explicitly in a service, similar to how the HierarchyCircuitBreakerService works.
Would implementing a new Plugin interface make sense to you both? Do you see another alternative for injecting this behaviour?
I'm happy to keep using the ActionFilter
as well - we may document it in the different places that depend on it (SemanticTextFieldMapper
and BulkOperation
).
Another option would be to do the refactoring proposed in this PR after the inference dependencies have been moved to server
(something that will eventually happen). That way we won't need to inject them at all as we will be able to use them directly from BulkOperation.
To summarize the options:
- Create a new
BulkActionProcessorPlugin
interface (or a different injection mechanism) - Use the current
ActionFilter
way - Use the current
ActionFilter
way, do a refactoring when inference services move toserver
What are your thoughts @original-brownbear @rjernst ?
@rjernst , @original-brownbear , any thoughts on the above options?
- Create a new BulkActionProcessorPlugin interface (or a different injection mechanism)
- Use the current ActionFilter way
- Use the current ActionFilter way, do a refactoring when inference services move to server
I'm closing this PR as there's no feedback to move it forward.
We can re-open again once inference related code has been moved to server
.