airflow-provider-fivetran icon indicating copy to clipboard operation
airflow-provider-fivetran copied to clipboard

Patient operator

Open elliotdes opened this issue 3 years ago • 9 comments

Adds a FivetranPatientOperator that waits for the Fievtran sync to finish before marking the task as completed. This is necessary for certain instances in which parallelism is limited.

elliotdes avatar Jan 25 '22 12:01 elliotdes

thank you @elliotdes, looking in this https://github.com/fivetran/airflow-provider-fivetran/pull/56 now

PubChimps avatar Jan 26 '22 21:01 PubChimps

@denimalpaca we have to use this operator in our DAG's due to limited parallelism in our Airflow instance.

The FivetranSensor works by checking the fivetran connectors initial status against future pokes. If the FivetranSensor begins after the connector has already finished its sync, then the sensor will hang. This is what was happening for us.

This FivetranPatientOperator would not be needed if the Fivetran API had unique id's for each triggered run that could then be checked explicitly. This is how dbt Cloud does it, and that works quite well for us.

elliotdes avatar Feb 24 '22 14:02 elliotdes

+1 on this - we are also experiencing this issue where the FivetranSensor starts after the connector has already finished its sync and so the FivetranSensor keeps reporting sync_state = scheduled.

kieronellis avatar Apr 22 '22 08:04 kieronellis

@PubChimps do you have any input here on using the Sensor to avoid starting after the connector has finished? My only thought is to pattern the DAG differently: have the sensor and connector run in parallel, then set the next task to trigger_rule=all_success.

denimalpaca avatar Apr 22 '22 12:04 denimalpaca

@denimalpaca is there a way to guarantee two tasks running in parallel? If we take an extreme example where we have parallelism set to 1 on Airflow, then i'm not sure if the both tasks would trigger at the same time

elliotdes avatar Jun 10 '22 09:06 elliotdes

@elliotdes if you have parallelism set to 1, or using the Sequential Executor, then no you can't have the tasks running in parallel. However, if you have the infra for parallelism, then you should be able to use priority weighting to make things run simultaneously.

denimalpaca avatar Jun 10 '22 13:06 denimalpaca

@denimalpaca would this operator be worth committing for people limited to 1 parallelism / Sequential Operator then?

elliotdes avatar Jun 17 '22 15:06 elliotdes

@elliotdes ideally people aren't running production pipelines with a sequential executor, but for those who are, yes this will probably be useful!

denimalpaca avatar Jun 17 '22 17:06 denimalpaca

Hi all, sorry for the delays here, but could you please check v 1.1.1 of the provider? @kieronellis this will solve the issues that you mentioned you have experienced

PubChimps avatar Jun 24 '22 19:06 PubChimps