pangeo-cloud-federation
pangeo-cloud-federation copied to clipboard
Google Cloud Filestore out of space
We are out of space again on our shared NFS filestore. Jupyter pods can't start
[I 2020-05-09 01:51:00.440 SingleUserNotebookApp notebookapp:1924] http://jupyter-0000-2d0001-2d5999-2d4917:8888/user/0000-0001-5999-4917/
[I 2020-05-09 01:51:00.440 SingleUserNotebookApp notebookapp:1925] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[E 2020-05-09 01:51:00.446 SingleUserNotebookApp notebookapp:1821] Failed to write server-info to /home/jovyan/.local/share/jupyter/runtime/nbserver-1.json: [Errno 28] No space left on device
Traceback (most recent call last):
File "/srv/conda/envs/notebook/bin/jupyterhub-singleuser", line 12, in <module>
sys.exit(main())
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 660, in main
return SingleUserNotebookApp.launch_instance(argv)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_core/application.py", line 270, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/traitlets/config/application.py", line 664, in launch_instance
app.start()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 565, in start
super(SingleUserNotebookApp, self).start()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/notebook/notebookapp.py", line 1933, in start
self.write_browser_open_file()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/notebook/notebookapp.py", line 1843, in write_browser_open_file
self._write_browser_open_file(open_url, f)
OSError: [Errno 28] No space left on device
@jhamman - can you remind us how you diagnosed the disk usage by user?
Happy Friday night everyone! 🙃
I tried examining the filesystem myself following these instructions to mount a filestore. But I got stuck
sudo mkdir -p /mnt/pangeo-filestore
sudo mount 171.161.186:/test /mnt/pangeo-filestore
The mount command eventually timed out with
mount.nfs: Network is unreachable
I must be doing something wrong, but can't figure out what.
I fixed this very temporarily by increasing to 2.2 TB. But we really need to sort this out and figure out a better solution for home directories.
I must be doing something wrong, but can't figure out what.
Note to self: the compute instance must be in the same region as the filestore. The firs time I tried it, my compute instance was in us-central-1a
but the filestore is in us-central-1b
. In the same region, things work.
I wrote a storage retention policy for uc berkeley hubs - https://docs.datahub.berkeley.edu/en/latest/topic/storage-retention.html. Going to implement some code soon. Maybe adopt a similar policy?