smart_open
smart_open copied to clipboard
Read importlib.metadata to find transport / compressor extensions
Title
Read importlib.metadata to find transport / compressor extensions.
Motivation
For now, if user install a package that provide new smart_open
transport mechanism, it cannot be automatically registered.
It would be great if smart_open
could be extended using setuptools / entry_points capabilities.
Example use case
Please note: schema is not relevant here.
In package smart_ext_transport
I have following file:
# smart_ext_transport/custom_transport.py
SCHEMA = "custom"
...
Current situation
The only way to load extension for now are listed bellow:
Option 1: using register_transport
everywhere and every time it might be required :cry:
register_transport("smart_ext_transport.custom_transport")
Option 2: using __init__.py
and import
statement:
Adding:
# smart_ext_transport/__init__.py
register_transport("smart_ext_transport.custom_transport")
Then every time custom
schema might be needed:
import smart_ext_transport
Improved situation with setuptools
In setup.py
from smart_ext_transport
we may add:
from setuptools import setup
setup(
name="smart_ext_transport",
packages=["smart_ext_transport"],
entry_points={"smart_open_transport": [
"custom = smart_ext_transport.custom_transport"],
},
)
Once package is installed, smart_open
can automatically load every plugin that is registered using setuptools entry points.
User will no longer have to import package to register extension manually.
This is how pytest / setuptools CLI / flake8 / jupyter nb convert works for example.
Handling correct version of importlib.metadata
is mainly inspired from pytest
codebase.
See my issue about this PR: #691
What it may change for the future ?
Future extension to smart_open
may be moved out of core library.
Example use case may be new extensions libraries like:
-
smart_open_zstd
to addzstandard
compress -
smart_open_xxx_cloud
to add new cloud provider without having to implementing it in core library (like AliCloud).
Core library will only have to maintain base / robust API and no longer have to support all possible URI provider / compression standard.
Checklist
Before you create the PR, please make sure you have:
- [x] Picked a concise, informative and complete title
- [x] Clearly explained the motivation behind the PR
- [x] Linked to any existing issues that your PR will be solving
- [x] Included tests for any new functionality
- [x] Checked that all unit tests pass