enterprise_gateway icon indicating copy to clipboard operation
enterprise_gateway copied to clipboard

OSError when trying to create spark notebook

Open Zhurik opened this issue 2 years ago • 2 comments

Description

When new jupyter notebook is created I'm trying to run some cells, but receive "Error Starting Kernel {"Gateway": "http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888", "Error": "Timeout during request"} {"Gateway": "http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888", "Error": "Timeout during request"}"

When checking logs of enterprise-gateway I see "OSError: [Errno 12] Cannot allocate memory". Only restarting enterprise-gateway helps to overcome this error.

What can I do about this?

Screenshots / Logs

Logs from enterpise-gateway:

[E 211221 08:32:53 web:1792] Uncaught exception POST /api/kernels (XXX.XXX.XXX.XXX)
    HTTPServerRequest(protocol='http', host='enterprise-gateway.enterprise-gateway.svc.cluster.local:8888', method='POST', uri='/api/kernels', version='HTTP/1.1', remote_ip='1XXX.XXX.XXX.XXX')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute
        result = await result
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/handlers.py", line 112, in post
        yield super(MainKernelHandler, self).post()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 46, in post
        kernel_id = yield maybe_future(km.start_kernel(kernel_name=model['name']))
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 151, in start_kernel
        kernel_id = await super(RemoteMappingKernelManager, self).start_kernel(*args, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/notebook/services/kernels/kernelmanager.py", line 176, in start_kernel
        kernel_id = await maybe_future(self.pinned_superclass.start_kernel(self, **kwargs))
      File "/opt/conda/lib/python3.7/site-packages/jupyter_client/multikernelmanager.py", line 426, in start_kernel
        await km.start_kernel(**kwargs)
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 350, in start_kernel
        await super(RemoteKernelManager, self).start_kernel(**kwargs)
      File "/opt/conda/lib/python3.7/site-packages/jupyter_client/manager.py", line 542, in start_kernel
        self.kernel = await self._launch_kernel(kernel_cmd, **kw)
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/kernels/remotemanager.py", line 406, in _launch_kernel
        proxy = await self.process_proxy.launch_process(kernel_cmd, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/processproxies/k8s.py", line 48, in launch_process
        await super(KubernetesProcessProxy, self).launch_process(kernel_cmd, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/processproxies/container.py", line 76, in launch_process
        self.local_proc = self.launch_kernel(kernel_cmd, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/enterprise_gateway/services/processproxies/processproxy.py", line 212, in launch_kernel
        return launch_kernel(cmd, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/jupyter_client/launcher.py", line 135, in launch_kernel
        proc = Popen(cmd, **kwargs)
      File "/opt/conda/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/opt/conda/lib/python3.7/subprocess.py", line 1482, in _execute_child
        restore_signals, start_new_session, preexec_fn)
    OSError: [Errno 12] Cannot allocate memory
[E 211221 08:32:55 web:2250] 500 POST /api/kernels (XXX.XXX.XXX.XXX) 38905.91ms

Environment

  • Enterprise Gateway Version: 2.3.0
  • Platform: Kubernetes
  • Others Spark 3.0.1
  • Enterprise-gateway resources: {"requests": {"cpu": "500m", "memory": "512Mi"}, "limits": {"cpu": "1500m", "memory": "2Gi"}}

Zhurik avatar Dec 22 '21 08:12 Zhurik

Hmm - have you tried removing your resource thresholds, or increasing them, in order to determine if those are interfering with the creation of the kernel pod?

kevin-bates avatar Dec 30 '21 01:12 kevin-bates

Hi @Zhurik - have you made progress with this? What came of removing (or increasing) your resource thresholds?

kevin-bates avatar May 20 '22 22:05 kevin-bates