elasticsearch-operator
elasticsearch-operator copied to clipboard
create-snapshot-repository should run a job and not cronjob
The current create-snapshot-repository cronjob is running every 30 minutes, it should run only once.
ES logs
java.lang.IllegalStateException: trying to modify or unregister repository that is currently used
at org.elasticsearch.repositories.RepositoriesService.ensureRepositoryNotInUse(RepositoriesService.java:395) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.repositories.RepositoriesService.access$000(RepositoriesService.java:56) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.repositories.RepositoriesService$1.execute(RepositoriesService.java:107) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:640) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:195) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:130) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-6.1.3.jar:6.1.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-04-09T13:36:36,921][WARN ][o.e.r.RepositoriesService] [035de622-f6ed-48df-b3de-8653ea93b0d1] failed to create repository [elasticsearch-bkp]
java.lang.IllegalStateException: trying to modify or unregister repository that is currently used
at org.elasticsearch.repositories.RepositoriesService.ensureRepositoryNotInUse(RepositoriesService.java:395) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.repositories.RepositoriesService.access$000(RepositoriesService.java:56) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.repositories.RepositoriesService$1.execute(RepositoriesService.java:107) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:640) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:195) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:130) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-6.1.3.jar:6.1.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Job log
$ kubectl logs -f -n logging elastic-logging-cluster-create-repository-1523278268-zvjb6
2018/04/09 14:03:39 [elasticsearch-cron] is up and running! 2018-04-09 14:03:39.740651389 +0000 UTC m=+0.001094446
time="2018-04-09T14:03:39Z" level=info msg="About to create Snapshot Repository..."
time="2018-04-09T14:03:39Z" level=error msg="Error creating snapshot repository [httpstatus: 500][url: https://elasticsearch-logging-cluster.logging.svc.cluster.local:9200/_snapshot/elasticsearch-bkp.ca.prod.u][body: {\"error\":{\"root_cause\":[{\"type\":\"repository_verification_exception\",\"reason\":\"[elasticsearch-bkp.ca.prod.u] [[3ndqxf-vSYyjkYJhrNmXng, 'RemoteTransportException[[7fc53062-2a3a-4902-9afe-0d905be4fca2][100.96.4.29:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];'], [P9G_LXVuS5uV7EsTyaDv3g, 'RemoteTransportException[[57caf2e8-96fd-46a6-b355-199d9e66b46d][100.96.5.21:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];']]\"}],\"type\":\"repository_verification_exception\",\"reason\":\"[elasticsearch-bkp.ca.prod.u] [[3ndqxf-vSYyjkYJhrNmXng, 'RemoteTransportException[[7fc53062-2a3a-4902-9afe-0d905be4fca2][100.96.4.29:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];'], [P9G_LXVuS5uV7EsTyaDv3g, 'RemoteTransportException[[57caf2e8-96fd-46a6-b355-199d9e66b46d][100.96.5.21:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];']]\"},\"status\":500}] "
(reverse-i-search)`edit ': kubectl edit statefulset -n logging
Would be nice to have a check, the reason it runs over and over is if you spin up the cluster, but don't have things setup correctly, we had issues where snapshots wouldn't work since the the repository didn't exist (since it failed on first run)
Actually this started after I applied my patch https://github.com/upmc-enterprises/elasticsearch-cron/pull/3 to cronjob. That means this was already happening before but was hidden by the wrong exit status.
Yeah, I think the logic needs to be changed here, maybe have the Cron check if the repo exists first, handle the result of that before starting.