jupyter_server icon indicating copy to clipboard operation
jupyter_server copied to clipboard

proposal: merge LargeFileManager into FileContentsManager

Open dlqqq opened this issue 3 years ago • 4 comments

Problem

I was tracing the ContentsManager implementation to find out how I can attach some logic to its copy/rename/move methods. According the dev docs, the default implementation of the ContentsManager abstract class is FileContentsManager.

After wasting a lot of time looking through serverapp.py to find where FileContentsManager was being initialized, I found that FileContentsManager is not the default implementation, but LargeFileManager is, initialized here as a default value for the contents_manager_class traitlet:

    # REMOVE in VERSION 2.0
    # Temporarily allow content managers to inherit from the 'notebook'
    # package. We will deprecate this in the next major release.
    contents_manager_class = TypeFromClasses(
        default_value=LargeFileManager,
        klasses=[
            "jupyter_server.services.contents.manager.ContentsManager",
            "notebook.services.contents.manager.ContentsManager",
        ],
        config=True,
        help=_i18n("The content manager class to use."),
    )

Digging through the source, it looks like LargeFileManager just extends the save method on FileContentsManager. That got me thinking: is there any reason to keep both LargeFileManager and FileContentsManager in our source tree? I simply don't see a use case where a developer would want to exclusively use FileContentsManager and ignore handling of large files. The extra level of inheritance makes the save() logic harder to trace, and just seems unnecessary. The 2.0 release is a good opportunity to clean up the codebase.

Proposed Solution

Merge the two classes together, and drop LargeFileManager. Same goes for AsyncLargeFileManager. More precisely, we can add a private method FileContentsManager#_save_large_file() and modify the existing FileContentsManager to have another block:

chunk = model.get("chunk", None)
if chunk is not None:
    self._save_large_file(...)
    return model

If we're against adding more logic to the 900-line FileContentsManager class, we can migrate file save logic to a helper or util function somewhere else. I'm just against inheritance being the favored approach for separating application logic here.

P.S.

One final note: can we drop "notebook.services.contents.manager.ContentsManager" from klasses in the traitlet definition?

dlqqq avatar Jul 01 '22 16:07 dlqqq

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar Jul 01 '22 16:07 welcome[bot]

I might be out of scope, but at the last Jupyter Server meeting @afshin suggested using something like WebDAV instead of having our own custom implementation of a contents API. I also opened an issue in Jupyverse.

davidbrochart avatar Jul 01 '22 16:07 davidbrochart

Thanks for opening @dlqqq!

After wasting a lot of time

Sorry to hear that!

The 2.0 release is a good opportunity to clean up the codebase

Sure. This seems like a reasonable idea to me. Would you like to make a PR?

We should put a deprecation cycle on the class name in case anyone is inheriting from the FileContentsManager and not the LargeFileContentsManager.

If we're against adding more logic to the 900-line FileContentsManager class

I don't think the number lines was ever the reason these classes weren't combined; rather, I think it came from a place of caution when this feature was added (5 years ago). We wanted to allow people to opt-out of the LargeFileContentsManager if they needed to (for whatever reason).

I'm just against inheritance being the favored approach for separating application logic here.

"Separating application logic" wasn't the reason these two classes were separated in the first place—i.e. it wasn't just to make the logic live in separate files so that Jupyter Server developers didn't have to look at many lines of code—rather, it was developed this way to provide two different layers/hooks on which developers could build custom contents managers to extend and configure its functionality. This inherit, extend, and override/configure model is what allows developers to customize Jupyter Server. We provide some default implementations of each manager (at various layers) for developers to build on top of those layers.

That said, I agree that this particular example of layering on logic to the ContentsManager seems unnecessary. I don't see much of a use-case for the FileContentsManager.

Zsailer avatar Jul 06 '22 22:07 Zsailer

#929 reminds me that we should also consider switching the default to the async form of whatever ContentsManager subclass we land on for our 2.0 release since this is probably the time to do it.

kevin-bates avatar Jul 19 '22 14:07 kevin-bates