opendal icon indicating copy to clipboard operation
opendal copied to clipboard

new feature: Split python binding in seperate packages

Open Xuanwo opened this issue 1 year ago • 1 comments
trafficstars

Feature Description

The OpenDAL Python binding now releases all its services in a single package, which makes it difficult to use and extend. I propose splitting it into separate packages, similar to what I've done with opendalfs.

graph TD;
    opendalfs.OpendalFileSystem -- import --> MemoryFileSystem;
    opendalfs.OpendalFileSystem -- import --> S3FileSystem;
    opendalfs.OpendalFileSystem -- import --> FsFileSystem;
    opendalfs.OpendalFileSystem -- import --> ...FileSystem;
    MemoryFileSystem -- use --> opendalfs-core;
    S3FileSystem -- use --> opendalfs-core;
    FsFileSystem -- use --> opendalfs-core;
    ...FileSystem -- use --> opendalfs-core;
    opendalfs-core -- use -->opendal["Apache OpenDAL"];

Problem and Solution

OpenDAL Python is large yet still doesn't cover all the services users need.

We can divide it into multiple packages, making opendal a virtual meta-package that only provides the Python API and imports the correct service on demand when needed.

For example, as I showed up in opendalfs:

https://github.com/fsspec/opendalfs/blob/e19d28eb9f82e285685f91b3c80805146759b7d7/opendalfs/fs.py#L8-L25

    def __init__(self, scheme, *args, **kwargs):
        super().__init__(*args, **kwargs)

        try:
            # Load the module dynamically based on scheme
            module = importlib.import_module(f"opendalfs_service_{scheme}")
            # Get the file system class based on scheme
            fs_class = getattr(module, f"{scheme.capitalize()}FileSystem")
            # initialize the file system with the kwargs
            self.fs = fs_class(**kwargs)
        except ImportError:
            raise ImportError(
                f"Cannot import opendal_service_{scheme}, please check if the module exists"
            )
        except AttributeError:
            raise AttributeError(
                f"Cannot find {scheme.capitalize()}FileSystem in opendal_service_{scheme}"
            )

Additional Context

No response

Are you willing to contribute to the development of this feature?

  • [ ] Yes, I am willing to contribute to the development of this feature.

Xuanwo avatar Jul 29 '24 12:07 Xuanwo

cc @Zheaoli, do you have any ideas?

Xuanwo avatar Oct 22 '24 10:10 Xuanwo

Honestly I don't think it should be splited by each service, It'll just add lots of bloated binary wheels that contains duplicated opendal, tokio and TLS library compiled machine code which is not great.

We could still have a default/core package that contains most of the services, but provide other non-default services as separate packages.

  • opendal: pure python meta package
    • opendal-core: default services, any services that does not require dynamic link to some system packages (like gssapi, krb5 etc) can be enabled
    • opendal-$service: other $service

messense avatar Jan 08 '25 03:01 messense

The OpenDAL Rust core is working on splitting into opendal-core (which will exclude tokio and tls), opendal-services-*, and opendal-layers-*. I believe this change will make more contributions to this issue.

Xuanwo avatar Jan 08 '25 03:01 Xuanwo