opendal
opendal copied to clipboard
new feature: Split python binding in seperate packages
Feature Description
The OpenDAL Python binding now releases all its services in a single package, which makes it difficult to use and extend. I propose splitting it into separate packages, similar to what I've done with opendalfs.
graph TD;
opendalfs.OpendalFileSystem -- import --> MemoryFileSystem;
opendalfs.OpendalFileSystem -- import --> S3FileSystem;
opendalfs.OpendalFileSystem -- import --> FsFileSystem;
opendalfs.OpendalFileSystem -- import --> ...FileSystem;
MemoryFileSystem -- use --> opendalfs-core;
S3FileSystem -- use --> opendalfs-core;
FsFileSystem -- use --> opendalfs-core;
...FileSystem -- use --> opendalfs-core;
opendalfs-core -- use -->opendal["Apache OpenDAL"];
Problem and Solution
OpenDAL Python is large yet still doesn't cover all the services users need.
We can divide it into multiple packages, making opendal a virtual meta-package that only provides the Python API and imports the correct service on demand when needed.
For example, as I showed up in opendalfs:
https://github.com/fsspec/opendalfs/blob/e19d28eb9f82e285685f91b3c80805146759b7d7/opendalfs/fs.py#L8-L25
def __init__(self, scheme, *args, **kwargs):
super().__init__(*args, **kwargs)
try:
# Load the module dynamically based on scheme
module = importlib.import_module(f"opendalfs_service_{scheme}")
# Get the file system class based on scheme
fs_class = getattr(module, f"{scheme.capitalize()}FileSystem")
# initialize the file system with the kwargs
self.fs = fs_class(**kwargs)
except ImportError:
raise ImportError(
f"Cannot import opendal_service_{scheme}, please check if the module exists"
)
except AttributeError:
raise AttributeError(
f"Cannot find {scheme.capitalize()}FileSystem in opendal_service_{scheme}"
)
Additional Context
No response
Are you willing to contribute to the development of this feature?
- [ ] Yes, I am willing to contribute to the development of this feature.
cc @Zheaoli, do you have any ideas?
Honestly I don't think it should be splited by each service, It'll just add lots of bloated binary wheels that contains duplicated opendal, tokio and TLS library compiled machine code which is not great.
We could still have a default/core package that contains most of the services, but provide other non-default services as separate packages.
- opendal: pure python meta package
- opendal-core: default services, any services that does not require dynamic link to some system packages (like
gssapi,krb5etc) can be enabled - opendal-$service: other $service
- opendal-core: default services, any services that does not require dynamic link to some system packages (like
The OpenDAL Rust core is working on splitting into opendal-core (which will exclude tokio and tls), opendal-services-*, and opendal-layers-*. I believe this change will make more contributions to this issue.