elasticsearch-operator icon indicating copy to clipboard operation
elasticsearch-operator copied to clipboard

create-snapshot-repository should run a job and not cronjob

Open gianrubio opened this issue 6 years ago • 3 comments

The current create-snapshot-repository cronjob is running every 30 minutes, it should run only once.

ES logs

java.lang.IllegalStateException: trying to modify or unregister repository that is currently used 
	at org.elasticsearch.repositories.RepositoriesService.ensureRepositoryNotInUse(RepositoriesService.java:395) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.repositories.RepositoriesService.access$000(RepositoriesService.java:56) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.repositories.RepositoriesService$1.execute(RepositoriesService.java:107) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:640) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:195) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:130) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-6.1.3.jar:6.1.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-04-09T13:36:36,921][WARN ][o.e.r.RepositoriesService] [035de622-f6ed-48df-b3de-8653ea93b0d1] failed to create repository [elasticsearch-bkp]
java.lang.IllegalStateException: trying to modify or unregister repository that is currently used 
	at org.elasticsearch.repositories.RepositoriesService.ensureRepositoryNotInUse(RepositoriesService.java:395) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.repositories.RepositoriesService.access$000(RepositoriesService.java:56) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.repositories.RepositoriesService$1.execute(RepositoriesService.java:107) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:640) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:195) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:130) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-6.1.3.jar:6.1.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

Job log

$ kubectl logs -f   -n logging elastic-logging-cluster-create-repository-1523278268-zvjb6
2018/04/09 14:03:39 [elasticsearch-cron] is up and running! 2018-04-09 14:03:39.740651389 +0000 UTC m=+0.001094446
time="2018-04-09T14:03:39Z" level=info msg="About to create Snapshot Repository..."
time="2018-04-09T14:03:39Z" level=error msg="Error creating snapshot repository [httpstatus: 500][url: https://elasticsearch-logging-cluster.logging.svc.cluster.local:9200/_snapshot/elasticsearch-bkp.ca.prod.u][body: {\"error\":{\"root_cause\":[{\"type\":\"repository_verification_exception\",\"reason\":\"[elasticsearch-bkp.ca.prod.u] [[3ndqxf-vSYyjkYJhrNmXng, 'RemoteTransportException[[7fc53062-2a3a-4902-9afe-0d905be4fca2][100.96.4.29:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];'], [P9G_LXVuS5uV7EsTyaDv3g, 'RemoteTransportException[[57caf2e8-96fd-46a6-b355-199d9e66b46d][100.96.5.21:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];']]\"}],\"type\":\"repository_verification_exception\",\"reason\":\"[elasticsearch-bkp.ca.prod.u] [[3ndqxf-vSYyjkYJhrNmXng, 'RemoteTransportException[[7fc53062-2a3a-4902-9afe-0d905be4fca2][100.96.4.29:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];'], [P9G_LXVuS5uV7EsTyaDv3g, 'RemoteTransportException[[57caf2e8-96fd-46a6-b355-199d9e66b46d][100.96.5.21:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[elasticsearch-bkp.ca.prod.u] missing];']]\"},\"status\":500}] "
(reverse-i-search)`edit ': kubectl edit statefulset -n logging

gianrubio avatar Apr 09 '18 14:04 gianrubio

Would be nice to have a check, the reason it runs over and over is if you spin up the cluster, but don't have things setup correctly, we had issues where snapshots wouldn't work since the the repository didn't exist (since it failed on first run)

stevesloka avatar Apr 09 '18 14:04 stevesloka

Actually this started after I applied my patch https://github.com/upmc-enterprises/elasticsearch-cron/pull/3 to cronjob. That means this was already happening before but was hidden by the wrong exit status.

gianrubio avatar Apr 10 '18 07:04 gianrubio

Yeah, I think the logic needs to be changed here, maybe have the Cron check if the repo exists first, handle the result of that before starting.

stevesloka avatar Aug 17 '18 13:08 stevesloka