dlt
dlt copied to clipboard
rest_api: Clarify how we specify incremental loading
Source name
rest_api
Describe the data you'd like to see
Currently, the declarative rest_api source offers two APIs to specify incremental data loads:
- declaring one parameter of type incremental (query parameter level in the config dictionary)
- declaring an incremental load at the resource level
Example for method 1:
"resources": [
{
"name": "posts",
"endpoint": {
"params": {
"limit": 100,
"since": {
"type": "incremental",
"cursor_path": "updated_at",
"initial_value": "2024-01-01",
"end_value": "2024-01-31",
"transform": callback,
},
},
},
},
],
Example for method 2:
"resources": [
{
"name": "posts",
"endpoint": {
"incremental": {
"start_param": "since",
"end_param": "until",
"cursor_path": "updated_at",
"initial_value": "2024-01-01",
"end_value": "2024-01-31",
"transform": callback,
},
},
},
],
This poses some challenges:
- method 1 (query parameter level) has less features than method 2 (resource level) because method 1 does not support the
end_param
. The reason is that it is nested as a child of thestart_param
. - The code for both methods is redundant
- users might be confused by having two APIs for the same thing but one API being slightly less powerful
- the rest_api source creates a
dlt.sources.Incremental
. However, the current integration with that incremental class might not be ideal because the rest_api source holds and applies thetransform
function, which allows value transformations, such as epoch to datetime.
Proposal
- Provide only one way to specify the incremental loads or make both ways equally powerful
- Move the transformation function into the
dlt.sources.Incremental
class and make it a method. Users can then pass either an instance of thedlt.sources.Incremental
or a dictionary.
Are you a dlt user?
Yes, I'm already a dlt user.
Do you ready to contribute this extension?
Yes, I'm ready.