enterprise_gateway
enterprise_gateway copied to clipboard
OSError when trying to create spark notebook
Description
When new jupyter notebook is created I'm trying to run some cells, but receive "Error Starting Kernel {"Gateway": "http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888", "Error": "Timeout during request"} {"Gateway": "http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888", "Error": "Timeout during request"}"
When checking logs of enterprise-gateway I see "OSError: [Errno 12] Cannot allocate memory". Only restarting enterprise-gateway helps to overcome this error.
What can I do about this?
Screenshots / Logs
Logs from enterpise-gateway:
[E 211221 08:32:53 web:1792] Uncaught exception POST /api/kernels (XXX.XXX.XXX.XXX)
HTTPServerRequest(protocol='http', host='enterprise-gateway.enterprise-gateway.svc.cluster.local:8888', method='POST', uri='/api/kernels', version='HTTP/1.1', remote_ip='1XXX.XXX.XXX.XXX')
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute
result = await result
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/handlers.py", line 112, in post
yield super(MainKernelHandler, self).post()
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/opt/conda/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 46, in post
kernel_id = yield maybe_future(km.start_kernel(kernel_name=model['name']))
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 151, in start_kernel
kernel_id = await super(RemoteMappingKernelManager, self).start_kernel(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/notebook/services/kernels/kernelmanager.py", line 176, in start_kernel
kernel_id = await maybe_future(self.pinned_superclass.start_kernel(self, **kwargs))
File "/opt/conda/lib/python3.7/site-packages/jupyter_client/multikernelmanager.py", line 426, in start_kernel
await km.start_kernel(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 350, in start_kernel
await super(RemoteKernelManager, self).start_kernel(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/jupyter_client/manager.py", line 542, in start_kernel
self.kernel = await self._launch_kernel(kernel_cmd, **kw)
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 406, in _launch_kernel
proxy = await self.process_proxy.launch_process(kernel_cmd, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/processproxies/k8s.py", line 48, in launch_process
await super(KubernetesProcessProxy, self).launch_process(kernel_cmd, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/processproxies/container.py", line 76, in launch_process
self.local_proc = self.launch_kernel(kernel_cmd, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/processproxies/processproxy.py", line 212, in launch_kernel
return launch_kernel(cmd, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/jupyter_client/launcher.py", line 135, in launch_kernel
proc = Popen(cmd, **kwargs)
File "/opt/conda/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "/opt/conda/lib/python3.7/subprocess.py", line 1482, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
[E 211221 08:32:55 web:2250] 500 POST /api/kernels (XXX.XXX.XXX.XXX) 38905.91ms
Environment
- Enterprise Gateway Version: 2.3.0
- Platform: Kubernetes
- Others Spark 3.0.1
- Enterprise-gateway resources: {"requests": {"cpu": "500m", "memory": "512Mi"}, "limits": {"cpu": "1500m", "memory": "2Gi"}}
Hmm - have you tried removing your resource thresholds, or increasing them, in order to determine if those are interfering with the creation of the kernel pod?
Hi @Zhurik - have you made progress with this? What came of removing (or increasing) your resource thresholds?