airflow-provider-lakeFS icon indicating copy to clipboard operation
airflow-provider-lakeFS copied to clipboard

Add Import operator

Open ozkatz opened this issue 8 months ago • 3 comments

Objective: Allow users that want to periodically import existing data from an object store into a lakeFS repository

How: Using Airflow as an orchestrator, users can create DAGs that trigger on an interval (every hour/day/..) - By introducing a lakeFS Import operator, users can automate the metadata ingestion process required, without having to directly interface with the lakeFS API, e.g.:

task = LakeFSImportOperator(
    task_id='import_from_s3',
    repo='my-repository',
    branch='main',
    commit_message='daily import of raw events',
    paths=[{
        'type': 'prefix', 
        'source': 's3://my-bucket/events/ingested/',
        'destination': 'datasets/events/',
    }],
)

ozkatz avatar Oct 31 '23 12:10 ozkatz