index-management
index-management copied to clipboard
[FEATURE] Allow Rollup jobs to handle huge index in chunks
Is your feature request related to a problem? Everyday, I have an index with the size ~ 135GB of primary shards (automatically rolled over) that I need to roll up but the jobs keep failing because of "heap usage exceeded", "elapsed time exceeded" due to the data set is too big. The roll up job has 1 aggs sum field and 6 dimensions. Each dimension has lots of values (10k-200k, like IP address, hostname) I have tried multiple settings value for page_size and datetime's interval but none of it worked Rollup jobs currently calculate the whole index in 1 go.
What solution would you like? I want to have a configuration where I can divide the data into smaller chunks (e.g: by datetime, by size). For example, the rollup job will query data in a 30-minute window, calculate sum,avg,min,max and move on to the next 30-minute window.
What alternatives have you considered? None
Do you have any additional context? When I open the debug logs, this is the query the rollup job is using.
"size": 0,
"query": {
"match_all": {
"boost": 1.0
}
},
"track_total_hits": -1,
"aggregations": {
"test": {
"composite": {
"size": 1,
"sources": [
{
"datetime.date_histogram": {
"date_histogram": {
"field": "datetime",
"missing_bucket": true,
"order": "asc",
"fixed_interval": "30m",
"time_zone": "UTC"
}
}
},
{
"clientip.terms": {
"terms": {
"field": "clientip",
"missing_bucket": true,
"order": "asc"
}
}
}
]
},
"aggregations": {
"transactionsize.sum": {
"sum": {
"field": "transactionsize"
}
}
}
}
}
}
Do you want to try continuous mode rollup, and set the running schedule to be 30m
Do you want to try continuous mode rollup, and set the running schedule to be 30m
I thought about that. I have a large set of data each day so that means I have to roll over the roll up indices as well. Unfortunately, when you setup a roll up job, if you use wildcard in the source index, you can’t use dynamic target (e.g: rollup_{{ctx.source_index}} ). And because it’s a continuous rollup job, you can’t roll over the target index.
Remember there's an example from the documentation that seems to help here. Pls let us know if whether it's working.
Remember there's an example from the documentation that seems to help here. Pls let us know if whether it's working.
Yes, I have tried this and it was the original problem, the index size was too big by the time it rollover & rollup, causing the job to always fails (heap memory, execution time)
Got it, the rollup created by ISM action is fixed to be non-continuous https://github.com/opensearch-project/index-management/blob/a2dd769f8ca62b45d89b8ba6e0d32770eeebf1ae/src/main/kotlin/org/opensearch/indexmanagement/rollup/model/ISMRollup.kt#L80
If this data class ISMRollup supports user to provide continuous parameter, then I think using ISM we can create a continuous rollup first and rollover later. This seems to be a way forward.
@DucQuach wanna check first if you'd like to help contribute this?
Got it, the rollup created by ISM action is fixed to be non-continuous
https://github.com/opensearch-project/index-management/blob/a2dd769f8ca62b45d89b8ba6e0d32770eeebf1ae/src/main/kotlin/org/opensearch/indexmanagement/rollup/model/ISMRollup.kt#L80
If this data class ISMRollup supports user to provide continuous parameter, then I think using ISM we can create a continuous rollup first and rollover later. This seems to be a way forward.
@DucQuach wanna check first if you'd like to help contribute this?
No, sorry. I’m just a user.
We will plan this, thanks!
@DucQuach I checked the 2 error messages you mentioned "heap usage exceeded", "elapsed time exceeded"
, seems these are from search backpressure feature, would you want to try tune the settings of this feature or disable it and see if it works for you? https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/search-backpressure/#search-backpressure-settings
The rollup query doesn't seem too heavy to me also.
I'm using AWS Managed Opensearch service so it can't be configured. What is the size of your index? Can you try with an index size >= 135GB. I'm trying to roll up the index by 5 fields, each field can have up to 20-50k unique value.
Another way I can resolve this is to use continuous roll up jobs with the same target index. But then it doesn't support deleting old data in the rollup index. Compared to elasticsearch, they support _delete_by_query in a rollup index (they have had this feature since 6.x) https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-delete-job.html#:~:text=If%20you%20wish%20to%20also,ID%20in%20the%20rollup%20index. Is it possible for Opensearch to support this feature?
I see, would you reach out to AWS support and ask them to cut a ticket to the service team regarding this issue.
alright, thanks