es7-persistance creates unlimited amount of indices
Describe the bug After certain amount of time shards problem raised. We checked ES and found out around 2300 shards, names like conductor_task_log_20221004.
Investigating code I see
// class ElasticSearchRestDAOV7
private void createIndexesTemplates() {
try {
initIndexesTemplates();
updateIndexesNames();
Executors.newScheduledThreadPool(1).scheduleAtFixedRate(this::updateIndexesNames, 0, 1, TimeUnit.HOURS);
} catch (Exception e) {
logger.error("Error creating index templates!", e);
}
}
//...
private void updateIndexesNames() {
logIndexName = updateIndexName(LOG_DOC_TYPE);
eventIndexName = updateIndexName(EVENT_DOC_TYPE);
messageIndexName = updateIndexName(MSG_DOC_TYPE);
}
private String updateIndexName(String type) {
String indexName =
this.indexPrefix + "_" + type + "_" + SIMPLE_DATE_FORMAT.format(new Date());
try {
addIndex(indexName);
return indexName;
} catch (IOException e) {
logger.error("Failed to update log index name: {}", indexName, e);
throw new NonTransientException(e.getMessage(), e);
}
}
Where updateIndexesNames creates new index every week. Can someone explain why does it change names for indices?
*_conductor_task_log*
*_conductor_message*
*_conductor_event*
I'd like to change this behavior, because we reaching limits on shards.
I also checked below methods are not used anywhere:
IndexDAO.getEventExecutions
IndexDAO.getMessages
so it is probably safe to stop indexing them:
conductor.app.eventMessageIndexingEnabled=false
conductor.app.eventExecutionIndexingEnabled=false
However to stop creating new indices we need to change this config:
conductor.elasticsearch.autoIndexManagementEnabled=false
And provision indices for conductor manually.
Details Conductor version: 3.18.0+ Persistence implementation: Postgres Queue implementation: Postgres Lock: Redis
To Reproduce Steps to reproduce the behavior: Just use elastic for a while
Expected behavior Data is cleaned up periodically.
I'm also studying house keeping mechanism now I think it's a good idea to have elasticsearch ILM set on index patterns, though it's not implemented yet in conductor so you have to manually configure it on elasticsearch
👋 Hi @astelmashenko
We're currently reviewing open issues in the Conductor OSS backlog, and noticed that this issue hasn't been addressed.
To help us keep the backlog focused and actionable, we’d love your input:
- Is this issue still relevant?
- Has the problem been resolved in the latest version v3.21.12?
- Do you have any additional context or updates to provide?
If we don’t hear back in the next 14 days, we’ll assume this issue is no longer active and will close it for housekeeping. Of course, if it's still a valid issue, just let us know and we’ll keep it open!
Thanks for contributing to Conductor OSS! We appreciate your support. 🙌
Jeff Bull
Developer Community Manager | Orkes
DM on Conductor Slack Email me!
@jeffbulltech , yes the issue is still relevant for any version, I checked, the codebase of es7-persistence is still the same.
@jeffbulltech , yes the issue is still relevant for any version, I checked, the codebase of es7-persistence is still the same.
Thanks for getting back to me @astelmashenko I'll make sure this issue remains open so it can be reviewed for an upcoming release.
Yes. this Issue is still relevant.
We are now also running into ES7 limits where it even crashes our Prod System...
4710 [main] ERROR com.netflix.conductor.es7.dao.index.ElasticSearchRestDAOV7 [] - Error creating index templates!
com.netflix.conductor.core.exception.NonTransientException: method [PUT], host [http://es:9200], URI [/conductor_task_log_20250804], status line [HTTP/1.1 400 Bad Request]
error={"root_cause":[{"type":"validation_exception","reason":"Validation Failed: 1: this action would add [10] shards, but this cluster currently has [997]/[1000] maximum normal shards open;"}],"type":"validation_exception","reason":"Validation Failed: 1: this action would add [10] shards, but this cluster currently has [997]/[1000] maximum normal shards open;"} status=400
When I investigated this via ES:9200/_cluster/allocation/explain,
I could see the following reason: 'A copy of this shard is already allocated to this node: [conductor_event_20250804][1], [node XXXX], [P], [s STARTED], [a id=XXXX]'.
By default, Conductor has indexShardCount = 5 and indexReplicasCount = 1.
However, when running ES as a single node, I assume this will cause 'duplicate' indices to be created as 'unassigned' shards, which will fill up the node very quickly.
I know it's best practice to have multiple ES nodes, but ensuring the Search workflow execution is "accurate" has not been a high priority for us, especially on testing/dev systems which get "reset" very often anyways, so the problem doesnt occur there.
For now, I am therefore using these settings for ES single node:
- conductor.elasticsearch.indexShardCount=2
- conductor.elasticsearch.index.replicas.count=0
It was hard to find because it is not mentioned in the Conductor documentation:
- https://conductor-oss.github.io/conductor/documentation/configuration/appconf.html#example-usage
Taken directly from here:
- https://github.com/conductor-oss/conductor/blob/main/es7-persistence/src/main/java/com/netflix/conductor/es7/config/ElasticSearchProperties.java