airflow-provider-lakeFS
airflow-provider-lakeFS copied to clipboard
Add Import operator
Objective: Allow users that want to periodically import existing data from an object store into a lakeFS repository
How: Using Airflow as an orchestrator, users can create DAGs that trigger on an interval (every hour/day/..) - By introducing a lakeFS Import operator, users can automate the metadata ingestion process required, without having to directly interface with the lakeFS API, e.g.:
task = LakeFSImportOperator(
task_id='import_from_s3',
repo='my-repository',
branch='main',
commit_message='daily import of raw events',
paths=[{
'type': 'prefix',
'source': 's3://my-bucket/events/ingested/',
'destination': 'datasets/events/',
}],
)