cocalc-kubernetes
cocalc-kubernetes copied to clipboard
Cocalc project restarts. How to investigate?
We are experimenting with Cocalc (a slightly slimmed image with fewer kernels and with increased memory defaults) for remote teaching/pair programming. (It works pretty well!) I am currently noticing three different types of crashes and would like to get a hint as to how to find out why the crash occurred/how I can fix it/see the logs.
-
Python kernel crashes. Seems to occur when I allocate too much memory in a numpy array for example. The relevant cell gets a red tag with the kernel killed message. All understandable, I can live with that. (Although I wouldn’t mind seeing this somewhere in some project admin/server admin logs.)
-
Project Pod sometimes gets killed. All I see is a Killed event in
kubectl get events
. Doesn’t happen super often, so it is not too bad, but I’d still like to get an idea why. -
Project restarts without notice. Sometimes this happens every 10 minutes while people are working on a project, so it doesn’t seem to be some idle timeout. (I figured it’s not the worst thing that can happen for teaching, as it clears all hidden variables and gives the student a clean state. ;) ) This is the nastiest problem as the reason is very unclear to me and I wouldn’t know where to look (and which limit to increase).
Any hints?
It might be that just updating the images would fix the problem. I don't know. Note that I spent about a month last year creating cocalc-kubernetes based on how cocalc-docker worked, but we've had a grand total of zero customers for cocalc-kubernetes (compared to quite a few for cocalc-docker). Thus development on cocalc-kubernetes has stalled, due to lack of demand from serious customers.
Thanks for the info. The images are already running an updated (and slightly patched – the /health
endpoint wouldn’t work, causing even earlier crashes) image. I hadn’t looked into cocalc-docker though. Maybe it would already be sufficient for the next edition of our course.