airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Microsoft Power BI operator to refresh the dataset

Open ambika-garg opened this issue 1 year ago • 3 comments

Airflow Power BI plugin.

Operators

PowerBIDatasetRefreshOperator

The operator triggers the Power BI dataset refresh and pushes the details of refresh in Xcom. It can accept the following parameters:

  • dataset_id: The dataset Id.
  • group_id: The workspace Id.
  • wait_for_termination: (Default value: True) Wait until the pre-existing or current triggered refresh completes before exiting.
  • force_refresh: When enabled, it will force refresh the dataset again, after pre-existing ongoing refresh request is terminated.
  • timeout: Time in seconds to wait for a dataset to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.
  • check_interval: Number of seconds to wait before rechecking the refresh status.

Hooks

PowerBI Hook

A hook to interact with Power BI.

  • powerbi_conn_id: Airflow Connection ID that contains the connection information for the Power BI account used for authentication.

Custom Connection form

Connection type: Power BI

You need to store following credentials:

  • client_id: The Client ID of your service principal.
  • client_secret: The Client Secret of your service principal.
  • tenant_id: The Tenant Id of your service principal.

Features

  • Xcom Integration: The Power BI Dataset refresh operator enriches the Xcom with essential fields for downstream tasks:

  1. powerbi_dataset_refresh_id: Request Id of the Dataset Refresh.
  2. powerbi_dataset_refresh_status: Refresh Status.
    • In Progress: Refresh state is unknown or a refresh is in progress.
    • Completed: Refresh successfully completed.
    • Failed: Refresh failed (details in powerbi_dataset_refresh_error).
    • Disabled: Refresh is disabled by a selective refresh.
  3. powerbi_dataset_refresh_end_time: The end date and time of the refresh (may be None if a refresh is in progress)
  4. powerbi_dataset_refresh_error: Failure error code in JSON format (None if no error)
  • External Monitoring link: The operator conveniently provides a redirect link to the Power BI UI for monitoring refreshes.

Sample DAG to use the plugin.

Check out the sample DAG code below:

from datetime import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator
from operators.powerbi_refresh_dataset_operator import PowerBIDatasetRefreshOperator


with DAG(
        dag_id='refresh_dataset_powerbi',
        schedule_interval=None,
        start_date=datetime(2023, 8, 7),
        catchup=False,
        concurrency=20,
        tags=['powerbi', 'dataset', 'refresh']
) as dag:

    refresh_in_given_workspace = PowerBIDatasetRefreshOperator(
        task_id="refresh_in_given_workspace",
        dataset_id="<dataset_id",
        group_id="workspace_id",
        force_refresh = False,
        wait_for_termination = False
    )

    refresh_in_given_workspace


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

ambika-garg avatar Jun 20 '24 18:06 ambika-garg

Hi @dabla, I'm not sure how to incorporate the msgraph operator into it to extend this operator to work in deferable mode. I would be really grateful if you could please review my PR and guide me through the same?

ambika-garg avatar Jun 21 '24 17:06 ambika-garg

Hi @dabla, I'm not sure how to incorporate the msgraph operator into it to extend this operator to work in deferable mode. I would be really grateful if you could please review my PR and guide me through the same?

Hello @ambika-garg, I've reviewed your PR and added comments in the code that could be improved, I didn't see much differences with the previous PR though, except the code has moved to the Microsoft Azure provider.

dabla avatar Jun 24 '24 18:06 dabla

Thanks a lot, @dabla, for reviewing the PR. I was struggling with how to leverage KiotaAdapterHook in my operator, and your comments have given me some insights. I will make the suggested changes. Thanks again!

ambika-garg avatar Jun 29 '24 18:06 ambika-garg

Hey @dabla, I added the Triggers class to extend the operator to works in deferable mode while extending async calls to MS graph operator. now this operator could support both async and sync mode. Can you please review it?

ambika-garg avatar Jul 05 '24 20:07 ambika-garg

Hey @dabla, I added the Triggers class to extend the operator to works in deferable mode while extending async calls to MS graph operator. now this operator could support both async and sync mode. Can you please review it?

Hello @ambika-garg , I've done a small change and committed int into your PR. I also commented on the PowerBI, one last change I would like to see to make it better. Code is starting to look good! Also saw you did something similar for Fabric, which we would also be interested in using. Once this PR is done we could create a new one and merged that one also into the Azure provider package. Of course this will all have to be discussed with the Airflow maintainers, best would be start a dev discussion for this in the dev list.

dabla avatar Jul 18 '24 14:07 dabla

Hey @dabla, thank you so much for reviewing the PR. I deeply appreciate it! I had a quick question related to the comment you added about using more of deferrable mechanism.

I think we can convert the "trigger_dataset_refresh" function to sync mode, because no polling is required in this function, and it would be handy, if user doesn't want to wait for the termination. Also, This is how other azure operators are implemented like AzureDataFactoryOperator.

WDYT?

ambika-garg avatar Jul 20 '24 04:07 ambika-garg

I added some more tests, to ensure full code coverage, I'm pasting the codecov html report below. Screenshot 2024-07-29 at 2 24 34 PM

ambika-garg avatar Jul 29 '24 09:07 ambika-garg

Hello @ambika-garg , I really like how to code has evolved now, really nice work 💯

Did you test this code locally and does it work and behave correctly? I think I will also try to test it locally first before approving this PR as we will be also using this operator instead of using multiple msgraph operators to achieve the same. If tests are fine then I think we can finish this up and finally get it merged with approval of @potiuk or @eladkal ofc.

dabla avatar Aug 08 '24 06:08 dabla

Hey @dabla, Thank you for your consistent support and assistance—it truly made this possible. I’ve tested the code locally, and it’s working perfectly. All tests have passed, and the code coverage is above 90%. Thanks again!

ambika-garg avatar Aug 08 '24 07:08 ambika-garg

@ambika-garg I've fixed the dataset_refresh_example, still don't understand why other tests are failing, I'm experiencing the same problem in my PR...

dabla avatar Aug 09 '24 12:08 dabla

Yea, I'm able to pass all the tests locally. But didn't understand why they are failing on CI. Are tests getting failed locally as well on your end?

ambika-garg avatar Aug 09 '24 12:08 ambika-garg

Yea, I'm able to pass all the tests locally. But didn't understand why they are failing on CI. Are tests getting failed locally as well on your end?

Might be those are side effects from other tests leaking. You can rebuild your base image to make sure that you have the same dependencies and run the whole failing test suite with breeze testing tests --test-type (see what test type it is failing with in the CI.

potiuk avatar Aug 11 '24 19:08 potiuk

Hi @davidblain-infrabel, I noticed that two of the test cases for the MSGraph hook were failing in CI, so I fixed them. You can refer to the screenshot for details.

image

ambika-garg avatar Aug 12 '24 08:08 ambika-garg

Hey @dabla, Finally I am able to pass all the checks of the CI, please review this PR once, that would help us to move forward with merging.

ambika-garg avatar Aug 12 '24 19:08 ambika-garg

@dabla @ambika-garg -> any other remaining points? It looks good to me but maybe I missed someething

potiuk avatar Aug 14 '24 00:08 potiuk

Hey @potiuk & @dabla, I've reviewed, tested and updated the code to the best of my ability. It looks good to me now, and I believe there are no remaining issues.

ambika-garg avatar Aug 14 '24 04:08 ambika-garg

Hey @potiuk & @dabla, I've reviewed, tested and updated the code to the best of my ability. It looks good to me now, and I believe there are no remaining issues.

Hello @ambika-garg I see that @potiuk approved the PR so it's now a matter of time to have a good build and I suppose they will merge it.

dabla avatar Aug 14 '24 07:08 dabla

@dabla @ambika-garg -> any other remaining points? It looks good to me but maybe I missed someething

Also look good to me, @ambika-garg did a great job and I went over all files yesterday and looks good to me now. Depending on the order in which we merge this PR or mine with api version, it might require a a small modification but I'm aware of it and will apply it.

dabla avatar Aug 14 '24 12:08 dabla