teraslice
teraslice copied to clipboard
Test k8s ex job failure scenario
There is probably an issue with handling the case where the k8s job for the teraslice execution has 0 running pods due to external failure. I think this results in workers being abandoned. Maybe Mike can say more.
Correct, once the execution controller job is marked as terminated, completed, etc and the pod goes away the deployment / workers remain but end up in a crash loop. In the case I experienced the state store went offline causing the orphan situation.
Possible ways to replicate:
- Stop the stateful storage or drop egress via network policy and wait for the jobs to fail
- Scale the worker deployments to 0 and I think the execution controller job may complete
When trying to clean up from my situation where elasticsearch became unavailable:
- First thing I did was
kubectl -n <teraslice namespace> get deployments --show-labels=true
and recorded all the jobids still in k8s. - Tried to call
/jobs/<jobid>/_stop
to clean up the exec job plus worker deployment under k8s:- If it responded with
stopped
then I did nothing else since everything cleaned up normally - If it responded with anything else (like
terminated
) I manually deleted the execution controller (job) and workers (deployment)
- If it responded with
- I then called
/jobs/<jobid>/_start
to redeploy the job after the original state issue was resolved
I think ideally once a job is completed, terminated, etc the now orphaned deployment either gets cleaned up or scaled down to 0 to avoid unnecessary crash loops or k8s cruft. I suspect deciding which way to go depends on how jobs are expected to be recovered in the event of execution controller completion / failure.
The one time I managed to get the execution controller to shut down (see #1018) by scaling to zero it cleaned up all the other bits with it. I know there's something here though so I am not taking this as really compelling evidence.
I just tried to generate a job failure scenario by doing the following
Scale the worker deployments to 0 and I think the execution controller job may complete
The job ends up in the failed
state with the following _failureReason
:
curl -Ss $(minikube ip):30678/ex/2a934ff3-9b23-445f-ae35-9a880158c983 | jq -r ._failureReason
TSError: slicer for ex 2a934ff3-9b23-445f-ae35-9a880158c983 had an error, shutting down execution, caused by Error: All workers from workers from 2a934ff3-9b23-445f-ae35-9a880158c983 have disconnected
at ExecutionController._terminalError (/app/source/packages/teraslice/lib/workers/execution-controller/index.js:322:23)
at Timeout.<anonymous> (/app/source/packages/teraslice/lib/workers/execution-controller/index.js:983:18)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
Caused by: Error: All workers from workers from 2a934ff3-9b23-445f-ae35-9a880158c983 have disconnected
at ExecutionController._startWorkerDisconnectWatchDog (/app/source/packages/teraslice/lib/workers/execution-controller/index.js:975:21)
at /app/source/packages/teraslice/lib/workers/execution-controller/index.js:174:18
at Server.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/server.js:179:13)
at Server.emit (events.js:327:22)
at Server.emit (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:108:15)
at Server.updateClientState (/app/source/packages/teraslice-messaging/dist/src/messenger/server.js:279:18)
at Socket.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/server.js:317:22)
at Socket.emit (events.js:315:20)
at Socket.emit (/app/source/node_modules/socket.io/lib/socket.js:141:10)
at Socket.onclose (/app/source/node_modules/socket.io/lib/socket.js:441:8)
at Client.onclose (/app/source/node_modules/socket.io/lib/client.js:235:24)
at Socket.emit (events.js:327:22)
at Socket.onClose (/app/source/node_modules/engine.io/lib/socket.js:311:10)
at Object.onceWrapper (events.js:421:28)
at WebSocket.emit (events.js:315:20)
at WebSocket.Transport.onClose (/app/source/node_modules/engine.io/lib/transport.js:127:8)
The logs in the Execution Controller pod after the scale down are:
[2020-08-19T20:36:46.473Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: client 172.17.0.6__6JW2Dpcy disconnected { reason: 'transport close' } (assignment=execution_controller, module=messaging:server, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:36:46.486Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: client 172.17.0.7__yyJqPfWe disconnected { reason: 'transport close' } (assignment=execution_controller, module=messaging:server, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.489Z] ERROR: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: slicer for ex 2a934ff3-9b23-445f-ae35-9a880158c983 had an error, shutting down execution, caused by Error: All workers from workers from 2a934ff3-9b23-445f-ae35-9a880158c983 have disconnected (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae, err.code=INTERNAL_SERVER_ERROR)
TSError: slicer for ex 2a934ff3-9b23-445f-ae35-9a880158c983 had an error, shutting down execution, caused by Error: All workers from workers from 2a934ff3-9b23-445f-ae35-9a880158c983 have disconnected
at ExecutionController._terminalError (/app/source/packages/teraslice/lib/workers/execution-controller/index.js:322:23)
at Timeout.<anonymous> (/app/source/packages/teraslice/lib/workers/execution-controller/index.js:983:18)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
Caused by: Error: All workers from workers from 2a934ff3-9b23-445f-ae35-9a880158c983 have disconnected
at ExecutionController._startWorkerDisconnectWatchDog (/app/source/packages/teraslice/lib/workers/execution-controller/index.js:975:21)
at /app/source/packages/teraslice/lib/workers/execution-controller/index.js:174:18
at Server.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/server.js:179:13)
at Server.emit (events.js:327:22)
at Server.emit (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:108:15)
at Server.updateClientState (/app/source/packages/teraslice-messaging/dist/src/messenger/server.js:279:18)
at Socket.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/server.js:317:22)
at Socket.emit (events.js:315:20)
at Socket.emit (/app/source/node_modules/socket.io/lib/socket.js:141:10)
at Socket.onclose (/app/source/node_modules/socket.io/lib/socket.js:441:8)
at Client.onclose (/app/source/node_modules/socket.io/lib/client.js:235:24)
at Socket.emit (events.js:327:22)
at Socket.onClose (/app/source/node_modules/engine.io/lib/socket.js:311:10)
at Object.onceWrapper (events.js:421:28)
at WebSocket.emit (events.js:315:20)
at WebSocket.Transport.onClose (/app/source/node_modules/engine.io/lib/transport.js:127:8)
[2020-08-19T20:37:46.509Z] FATAL: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution 2a934ff3-9b23-445f-ae35-9a880158c983 is ended because of slice failure (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.509Z] DEBUG: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: stopping scheduler... (assignment=execution_controller, module=execution_scheduler, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.509Z] DEBUG: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution 2a934ff3-9b23-445f-ae35-9a880158c983 is finished scheduling, 7 remaining slices in the queue (assignment=execution_controller, module=execution_scheduler, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.511Z] WARN: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: clients are all offline, but there are still 1 pending slices (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.512Z] DEBUG: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution 2a934ff3-9b23-445f-ae35-9a880158c983 did not finish (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.517Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: [START] "elasticsearch_data_generator" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.517Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: [FINISH] "elasticsearch_data_generator" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: calculating statistics (assignment=execution_controller, module=slice_analytics, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: (assignment=execution_controller, module=slice_analytics, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
operation elasticsearch_data_generator
average completion time of: 673.63 ms, min: 473 ms, and max: 886 ms
average size: 5000, min: 5000, and max: 5000
average memory: 5183018, min: -7736312, and max: 8422968
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: (assignment=execution_controller, module=slice_analytics, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
operation example-op
average completion time of: 0.13 ms, min: 0 ms, and max: 1 ms
average size: 5000, min: 5000, and max: 5000
average memory: 2640, min: 1544, and max: 4848
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: (assignment=execution_controller, module=slice_analytics, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
operation delay
average completion time of: 30000.75 ms, min: 30000 ms, and max: 30001 ms
average size: 5000, min: 5000, and max: 5000
average memory: -3723653, min: -12046792, and max: 208104
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: (assignment=execution_controller, module=slice_analytics, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
operation elasticsearch_index_selector
average completion time of: 24.5 ms, min: 18 ms, and max: 28 ms
average size: 5000, min: 5000, and max: 5000
average memory: 1633130, min: 1328184, and max: 1743240
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: (assignment=execution_controller, module=slice_analytics, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
operation elasticsearch_bulk
average completion time of: 312.75 ms, min: 293 ms, and max: 348 ms
average size: 5000, min: 5000, and max: 5000
average memory: 6128912, min: -4279272, and max: 9625368
[2020-08-19T20:37:46.520Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution 2a934ff3-9b23-445f-ae35-9a880158c983 has finished in 216 seconds (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.528Z] DEBUG: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution 2a934ff3-9b23-445f-ae35-9a880158c983 is done (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.629Z] DEBUG: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution shutdown was called for ex 2a934ff3-9b23-445f-ae35-9a880158c983 (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.631Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: shutting down. (assignment=execution_controller, module=ex_storage, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.632Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: shutting down (assignment=execution_controller, module=state_storage, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.636Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: client 2a934ff3-9b23-445f-ae35-9a880158c983 disconnected { reason: 'io client disconnect' } (assignment=execution_controller, module=messaging:client, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:46.835Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution_controller received process:SIGTERM, already shutting down, remaining 30s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:51.638Z] WARN: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution controller 2a934ff3-9b23-445f-ae35-9a880158c983 is shutdown (assignment=execution_controller, module=execution_controller, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:51.639Z] INFO: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: execution_controller shutdown took 5s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
[2020-08-19T20:37:52.640Z] DEBUG: teraslice/6 on ts-exc-example-data-generator-job-0c8eaaee-146e-spv6h: flushed logs successfully, will exit with code 0 (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=UevO1XrS, ex_id=2a934ff3-9b23-445f-ae35-9a880158c983, job_id=0c8eaaee-146e-4136-a1d7-acbe88951eae)
The logs in the master are:
[2020-08-19T20:37:46.530Z] DEBUG: teraslice/14 on teraslice-master-57b6b9b44d-4wzkp: execution 2a934ff3-9b23-445f-ae35-9a880158c983 finished, shutting down execution (assignment=cluster_master, module=execution_service, worker_id=XIZ1YZ4i)
[2020-08-19T20:37:46.538Z] INFO: teraslice/14 on teraslice-master-57b6b9b44d-4wzkp: k8s._deleteObjByExId: 2a934ff3-9b23-445f-ae35-9a880158c983 execution_controller jobs deleting: ts-exc-example-data-generator-job-0c8eaaee-146e (assignment=cluster_master, module=kubernetes_cluster_service, worker_id=XIZ1YZ4i)
[2020-08-19T20:37:46.571Z] INFO: teraslice/14 on teraslice-master-57b6b9b44d-4wzkp: k8s._deleteObjByExId: 2a934ff3-9b23-445f-ae35-9a880158c983 worker deployments deleting: ts-wkr-example-data-generator-job-0c8eaaee-146e (assignment=cluster_master, module=kubernetes_cluster_service, worker_id=XIZ1YZ4i)
[2020-08-19T20:37:46.638Z] INFO: teraslice/14 on teraslice-master-57b6b9b44d-4wzkp: client 2a934ff3-9b23-445f-ae35-9a880158c983 disconnected { reason: 'client namespace disconnect' } (assignment=cluster_master, module=messaging:server, worker_id=XIZ1YZ4i)
I think this is the desired behavior. Perhaps the .failureReason
could be improved, but really, the workers went away ... and the execution controller times out and exits and fails the execution. This seems correct. It's possible we'd want more information about the k8s resource to understand an unexpected failure like this. But I think this is OK.
Here are the logs in the execution pod when the master pod is shutdown and restarted.
[2023-11-16T18:04:56.067Z] INFO: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: worker 10.244.0.12__Ip8ilsTN has completed its slice 97a0a540-e56a-4e20-b9d9-392b515fa240 (assignment=execution_controller, module=execution_controller, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:04:56.075Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: dispatched slice d93c771c-5681-4e14-a486-be9a8a262b69 to worker 10.244.0.12__Ip8ilsTN (assignment=execution_controller, module=execution_controller, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:05:23.866Z] WARN: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: cluster master did not record the cluster analytics (assignment=execution_controller, module=execution_analytics, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:05:26.145Z] INFO: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: worker 10.244.0.12__Ip8ilsTN has completed its slice d93c771c-5681-4e14-a486-be9a8a262b69 (assignment=execution_controller, module=execution_controller, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:05:26.157Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: dispatched slice 62aea529-f6f4-420f-89c9-4435444485de to worker 10.244.0.12__Ip8ilsTN (assignment=execution_controller, module=execution_controller, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:05:56.215Z] INFO: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: worker 10.244.0.12__Ip8ilsTN has completed its slice 62aea529-f6f4-420f-89c9-4435444485de (assignment=execution_controller, module=execution_controller, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:05:56.225Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: dispatched slice a67e02dc-b8cc-4e62-a5e0-642d9475fda7 to worker 10.244.0.12__Ip8ilsTN (assignment=execution_controller, module=execution_controller, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
[2023-11-16T18:06:14.351Z] ERROR: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: Client ClusterMaster is not ready (assignment=execution_controller, module=messaging:client, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
Error: Client ClusterMaster is not ready
at Client.waitForClientReady (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:95:19)
at runNextTicks (node:internal/process/task_queues:60:5)
at process.processTimers (node:internal/timers:509:9)
at async Socket.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:73:21)
[2023-11-16T18:06:16.364Z] ERROR: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: Client ClusterMaster is not ready (assignment=execution_controller, module=messaging:client, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
Error: Client ClusterMaster is not ready
at Client.waitForClientReady (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:95:19)
at async Socket.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:73:21)
[2023-11-16T18:06:18.362Z] ERROR: teraslice/10 on ts-exc-kafka-to-es-d9054c49-8c6e-k2v8q: Client ClusterMaster is not ready (assignment=execution_controller, module=messaging:client, worker_id=dPlbIJ6p, ex_id=d06dfac0-d859-413c-b566-7de9280b91eb, job_id=d9054c49-8c6e-4729-bdc8-5cb4d0f90377)
Error: Client ClusterMaster is not ready
at Client.waitForClientReady (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:95:19)
at runNextTicks (node:internal/process/task_queues:60:5)
at process.processTimers (node:internal/timers:509:9)
at async Socket.<anonymous> (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:73:21)
We have made improvements to K8s, please check if this:
https://github.com/terascope/teraslice/issues/893#issuecomment-676725895
still happens, if it does not, please indicate as much and close this issue. If it still happens, then consider a solution.
This no longer a problem, updating node and libraries versions helped fix this