airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Power BI Provider

Open ambika-garg opened this issue 10 months ago • 6 comments

Apache Airflow Provider for Power BI.

Operators

PowerBIDatasetRefreshOperator

The operator triggers the Power BI dataset refresh and pushes the details of refresh in Xcom. It can accept the following parameters:

  • dataset_id: The dataset Id.
  • group_id: The workspace Id.
  • wait_for_termination: (Default value: True) Wait until the pre-existing or current triggered refresh completes before exiting.
  • force_refresh: When enabled, it will force refresh the dataset again, after pre-existing ongoing refresh request is terminated.
  • timeout: Time in seconds to wait for a dataset to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.
  • check_interval: Number of seconds to wait before rechecking the refresh status.

Hooks

PowerBI Hook

A hook to interact with Power BI.

  • powerbi_conn_id: Airflow Connection ID that contains the connection information for the Power BI account used for authentication.

Features

  • Xcom Integration: The Power BI Dataset refresh operator enriches the Xcom with essential fields for downstream tasks:

  1. powerbi_dataset_refresh_id: Request Id of the Dataset Refresh.
  2. powerbi_dataset_refresh_status: Refresh Status.
    • Unknown: Refresh state is unknown or a refresh is in progress.
    • Completed: Refresh successfully completed.
    • Failed: Refresh failed (details in powerbi_dataset_refresh_error).
    • Disabled: Refresh is disabled by a selective refresh.
  3. powerbi_dataset_refresh_end_time: The end date and time of the refresh (may be None if a refresh is in progress)
  4. powerbi_dataset_refresh_error: Failure error code in JSON format (None if no error)
  • External Monitoring link: The operator conveniently provides a redirect link to the Power BI UI for monitoring refreshes.

Sample DAG to use the plugin.

Check out the sample DAG code below:

from datetime import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator
from operators.powerbi_refresh_dataset_operator import PowerBIDatasetRefreshOperator


with DAG(
        dag_id='refresh_dataset_powerbi',
        schedule_interval=None,
        start_date=datetime(2023, 8, 7),
        catchup=False,
        concurrency=20,
        tags=['powerbi', 'dataset', 'refresh']
) as dag:

    refresh_in_given_workspace = PowerBIDatasetRefreshOperator(
        task_id="refresh_in_given_workspace",
        dataset_id="<dataset_id",
        group_id="workspace_id",
        force_refresh = False,
        wait_for_termination = False
    )

    refresh_in_given_workspace

ambika-garg avatar Apr 24 '24 19:04 ambika-garg

Just a kind reminder that proposal to add a new provider should be announced with justification - why you think the provider cannot be released and maintained outside of the community set of providers. Should be a thread at devlist and consensus reached by the community that we want it.

See the https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers for details

Example where you could see discussion and proposal about new providers (but you can search for others):

  • Discussion: https://lists.apache.org/thread/0d669fmy4hn29h5c0wj0ottdskd77ktp
  • Resulting VOTE thread: https://lists.apache.org/thread/skh32jksvcf4yx4fhhsfz8lq6w5nhfjc

potiuk avatar Apr 25 '24 11:04 potiuk

A PowerBi / Microsoft Fabric provider would be really nice ! We (Infrabel) started to work on that, via @dabla's MsGraph Operators.

@ambika-garg I contacted you on Airflow's Slack. I'd like to discuss the further plans for this provider, and eventually how we can collaborate ?

Joffreybvn avatar Apr 29 '24 10:04 Joffreybvn

A PowerBi / Microsoft Fabric provider would be really nice ! We (Infrabel) started to work on that, via @dabla's MsGraph Operators.

@ambika-garg I contacted you on Airflow's Slack. I'd like to discuss the further plans for this provider, and eventually how we can collaborate ?

This provider is a specialized operator for refreshing PowerBI datasets, but the MSGraphAsyncOperator (with the Trigger and Sensor) also allows you to achieve the same without a dedicated operator, but then you'll need to combine multiple ones. Nonetheless this could be a handy operator and nice addition as it combines the triggering and polling of the status of the dataset refresh in one handy operator. I agree with @Joffreybvn that this would be a nice opportunity to collaborate on this one and make sure this operator re-uses as much common code (for example the KiotaRequestAdapterHook could be shared in this case) as possible with the MSGraphAsyncOperator. The polling for example could then be done in an Aync way so that we don't block the Airflow workers until we get back a response from the PowerBI REST API.

dabla avatar Apr 29 '24 12:04 dabla

After discussing with @Joffreybvn, I'm considering integrating this operator into Azure Data Factory using a generic connection type. Given that Power BI seems to be integrated into Microsoft Fabric, I plan to develop a Fabric Provider in the future, allowing us to consolidate these operators within Fabric. What do you think about this strategy?

ambika-garg avatar May 15 '24 14:05 ambika-garg

Why not just adding it to microsoft.azure then? Adding new provider to manage, is quite an overhead (and to remind again - it would require devlist discussion). Adding just an operator to microsoft.azure seems to be preferred. Unless you think of course Fabric provider deserves to be separated for some reason.

Also I think starting devlist discussion is a good start to see how Microsoft is going to possibly support their providers via system tests similarly to Amazon and (soon) Google. Both Amazon and Google team developed and maintain and (Google soon) publish dashboards with system tests / example dags run in their systems that give the community (and AWS/Google teams) opportunity to see and fix any problems resulting with real services. That might be great opportunity for Microsoft to become more visible as stakeholder and follow the good practices of others.

potiuk avatar May 19 '24 16:05 potiuk

Agreed, I'll submit this PR for the Power BI operator under microsoft.azure (initially, I mistakenly mentioned Azure Data Factory). So, let's go ahead and close this PR for now.

ambika-garg avatar May 20 '24 17:05 ambika-garg