elasticsearch
elasticsearch copied to clipboard
[Transform] Transforms with unattended flag don't create destination index unless all conditions/fields exist in source index
Description
We added the unattended
flag to transforms shipped in integration packages (example: https://github.com/elastic/integrations/pull/8320).
In the past, without the unattended
flag, once the package is installed on a fresh cluster:
- Transform is installed
- Destination index is created on install
-
.latest
and.all
aliases for the destination index is created (see https://github.com/elastic/kibana/pull/142920 )
Now, with the unattended
flag, on a fresh cluster:
- Transform is installed
- Destination index doesn't seem to be created successfully
After testing on v8.11.1
(so that this fix https://github.com/elastic/elasticsearch/pull/101627 would be there), transforms with the unattended
flag don't seem to create the destination index like without the unattended
flag.
It turns out, the destination index is only created when there is exact data that matches the criteria (e.g. fields host.name
, destination.ip
, etc. exist in logs-*
) for the transform to run, compared to before, the destination index can be created regardless. This gives the impression that the package hasn't fully been installed.
What we want to clarify is: Is this expected behavior with the unattended
flag?
If so, can it be implemented so the behavior is the same as before (create destination index regardless of available data) so that it's clearer to users when the transform and associated indices have been created?
Related links
- https://github.com/elastic/elasticsearch/pull/101627
- https://github.com/elastic/endpoint-package/pull/401
- https://github.com/elastic/integrations/pull/8320
Pinging @elastic/ml-core (Team:ML)
Hi,
In the past, without the unattended flag, once the package is installed on a fresh cluster: Transform is installed Destination index is created on install .latest and .all aliases for the destination index is created
Confirm, that's the correct behavior with unattended
set to false
After testing on v8.11.1 (so that this fix https://github.com/elastic/elasticsearch/pull/101627 would be there), transforms with the unattended flag don't seem to create the destination index like without the unattended flag.
That's true. In case of unattended
transform, we explicitly skip destination index creation on _start
call (which is a part of package install).
It turns out, the destination index is only created when there is exact data that matches the criteria (e.g. fields host.name, destination.ip, etc. exist in logs-*) for the transform to run
In case of unattended
, the destination index is created when the first document is being written to it.
So, if there is proper data in source index, you'll eventually see the destination index created and the first results written to it.
This gives the impression that the package hasn't fully been installed.
I understand your concern here. Without explicit destination index creation, it is less predictable when exactly the destination index (and its aliases) will be set up.
Is this expected behavior with the unattended flag?
Confirm, working as intended.
If so, can it be implemented so the behavior is the same as before (create destination index regardless of available data) so that it's clearer to users when the transform and associated indices have been created?
I'll need to think how such a change would fit the current codebase. I'll ping this issue soon.
@susan-shu-c, it seems you can achieve what you need by reverting the transform to non-unattended. Precisely, you want these 2 settings in your transform config:
"settings": {
"unattended": false,
"num_failure_retries": -1
}
This way the transform will not be unattended (so it will create destination index just like it used to) but at the same time it will retry most of the failures indefinitely (without limit).
Having said that, there will still be failures that will not be retried (like script exception) so the transform will not be fully unattended
.
Are there any reasons (other than indefinite retry limit) that made you switch to unattended
?
Pasting our Slack conversation for reference:
We added
unattended: true
so that the install would work on Serverless
(as requested by Sophie Chang, not going to link it here as it was an internal GitHub discussion)
I was able to reproduce the issue locally.
The problem is that if the transform destination index is created dynamically (not on _start_
but later during indexing), then we do not set up this index' aliases.
This is a bug that we need to fix in our backend code.
FYI: I have opened a PR with the fix (https://github.com/elastic/elasticsearch/pull/105499).
Awesome, thank you! So with #105499 we can install packages with unattended: true
or unattended: false
and in both cases, the destination index will be created on package install?
Awesome, thank you! So with https://github.com/elastic/elasticsearch/pull/105499 we can install packages with unattended: true or unattended: false and in both cases, the destination index will be created on package install?
Not exactly. This bugfix makes destination index and its aliases set up correctly once the transform sees source indices and is ready to start processing them. This should solve your immediate problem of missing aliases and should be enough for your setup to work correctly (but of course let us know if it is not the case and there are further issues).
Creating destination index before source indices are ready is a more complex topic that we want to tackle too, but we won't have any solution for it in 8.13
.
We need to re-design the transform's workflow to accommodate this change, that's why we don't want to rush it before feature freeze.