jupyter_server
jupyter_server copied to clipboard
Handling large number of files
At the moment, when a user opens a folder from notebook or jupyterlab, jupyter_server would read all the files inside the folder using os.lstat, which is very costly for large number of files.
https://github.com/jupyter-server/jupyter_server/blob/51e3ec362b2b12af48f0e101959c4cbec9d5cb33/jupyter_server/services/contents/filemanager.py#L262-L271
This makes it basically impossible to open a folder with large number of files, the backend would freeze for a long time before being responsive again. And even when the backend returns the data, the frontend would crash due to the rendering of all the files. See https://github.com/jupyterlab/jupyterlab/issues/8700
It would be nice to improve this architecture, using paging or other methods to partially read the files.
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
I created a draft pull request https://github.com/jupyter-server/jupyter_server/pull/539
Together with my other commit https://github.com/cnydw/jupyterlab/commit/6e615c058d9b9e27caeba405b7c3f32446d90214 on the JupyterLab frontend, it could open a folder with 100000 files without problem.

The two commits I made are just POC, the API changes can certainly be improved. I think it makes sense to first make the backend API changes in jupyter_server, then propagate the frontend changes to JupyterLab and Jupyter Notebook accordingly.
@fcollonval @telamonian
hi, any updates on getting this merged?