qol: improve `rest_api` source public interface
Improve user experience and LLM code generation capabilities by simplifying import paths for declarative REST API interface.
Problem
Currently, users need to rummage through 2 main modules:
from dlt.sources.rest_api import ...
from dlt.sources.helpers.rest_client import ...
Then, dlt.sources.helpers.rest_client contains 8 submodules, including some that should maybe be private (utils, typing).
Solution
- Centralize everything under
dlt.sources.rest_apimodule. It makes more sense to keep this one since it's the required import and the other contains helpers. - Make helpers importable directly from
dlt.sources.rest_apiinstead of submodules. - This should be made by proxying classes and be completely backwards compatible. This would require an appropriate set of tests
Valid point.
Centralize everything under dlt.sources.rest_api module. It makes more sense to keep this one since it's the required import and the other contains helpers.
I have a different idea: dlt.sources.helpers.rest_client is a misnomer, should be named http_client as it's more general-purpose HTTP client. It could be used by non-REST sources (e.g., GraphQL). Putting under rest_client namespace is also not an ideal solution. I'm not very happy with helpers either but it separates utility modules from the true sources under dlt.sources. Maybe these utility need to go dlt.common.
I'm not very happy with helpers either but it separates utility modules from the true sources under dlt.sources. Maybe these utility need to go dlt.common.
We should be comfortable decoupling "public interface" i.e., code users write from how we structure code. For example, it doesn't sound crazy to me to have the following all import the same object (could be confusing though).
from dlt.sources.rest_api import BearerToken
from dlt.sources.graphql_api import BearerToken
from dlt.common.http.auth import BearerToken
I'm trying to improve the UX by ensuring that "when I want to build a REST API source, I can import all of the necessary things from a single place". I think the impact on LLM codegen could be very significant. Also, users get better IDE completion for their imports without having to guess the library structure
How it works
For us, we are free to move where BearerToken is implemented, but the user-facing module always look as follow:
# dlt/sources/rest_api/__init__.py
from dlt.sources.http.auth import BearerToken # could be anywhere
__all__ = (
...,
"BearerToken", # doesn't need to be defined in the current file
)