pbi-tools
pbi-tools copied to clipboard
Deploy profiles for Datasets using wildcard in path throws error
Version: 1.0.0-rc.2+preview.4
Issue Summary
Thin report deployment profiles allow wildcards in the folder path enabling this approach to deploy all thin pbix files that are under the folder path (multiple reports).
The new "Dataset" deployment profile does not allow wildcards in the path forcing datasets (or thick reports) to be deployed one at a time, each having their own profiles defined in their own .pbixproj.json file instead of being deployed as a set within the same folder structure sharing a common deployment profile in a parent-level .pbixproj.json file.
Example Demonstrating the Issue
In this example, I have two thick reports that are contained in the reports
folder:
- Placement Process
- Recruiter KPI Metrics
My .pbixproj.json file is also in the reports folder. The deploy profile and path is as follows:
Attempting to run a deploy on this profile results in this Illegal characters in path
exception.
Thanks. This is an awesome project!
@mthierba I am getting the same issue on latest preview build, should we rollback to previous?
This is by design. Wildcards are not supported for datasets. Each dataset, in my view, is a substantial stand-alone development project that it didn't seem sensible handling those in bulk for deployments. I can't see a use case here. Happy for you to change my mind.
Ok thank you for clarifying. In my use case, we have 3 workspaces for our Shared Datasets for Datasets [Dev], [UAT], and [Prod]. We have multiple shared datasets residing in the Datasets workspace and so would like the deployment to push all datasets from the repo not just one. How would I achieve that currently? I may be misunderstanding.
e.g. we have some datasets shared across business units but also data marts for different business unit and so different star schemas / data models for each purpose built data mart
I guess I could somehow program the deployment to only deploy dataset folders with changed files possibly. We mostly use direct query but I guess I can see what you mean for import mode datasets possibly.
My argument justifying this wildcard feature is that different Power BI developers have different styles and business needs. Such a feature would make the pbi-tools project more versatile and inviting to a wider audience leading to higher adoption in the community. For example, in my case I have 10 stand-alone thick reports that have some shared concepts (concepts A, B, and C), but their datasets are distinctly different from each other. These differences don't support the need for shared datasets across thin reports. The current tool doesn't accept wildcards in the path. Therefore, I'm able to co-locate multiple reports that all fall under "concept A" in the same repo and folder structure, but the deployment options are limited. My goal is to deploy all the reports under the same Concept at the same time into the same workspace environments. Thus all of the attributes configured in the manifest files are the same for each of the reports. However, since wildcards are not supported, I'm not able to maintain a single, root-level pbixproj file that defines the deployment config for all the underlying reports in the repo. Rather, I must create a manifest file for each report copy and pasting all of the same manifest attributes (with distinct profile names) for each report folder in the repo. This violates the DRY principle and creates additional code/config maintenance while increasing risk of human error. Adding support for the wildcard in the path would enable a single shared manifest for scenarios like this mitigating the issues above.
On Mon, Sep 12, 2022 at 4:39 PM Mathias Thierbach @.***> wrote:
This is by design. Wildcards are not supported for datasets. Each dataset, in my view, is a substantial stand-alone development project that it didn't seem sensible handling those in bulk for deployments. I can't see a use case here. Happy for you to change my mind.
— Reply to this email directly, view it on GitHub https://github.com/pbi-tools/pbi-tools/issues/189#issuecomment-1244528626, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUSVNXFHNKU5OC3YPQS33V56PKPANCNFSM6AAAAAAQH5SLFE . You are receiving this because you authored the thread.Message ID: @.***>
Ok thank you for clarifying. In my use case, we have 3 workspaces for our Shared Datasets for Datasets [Dev], [UAT], and [Prod]. We have multiple shared datasets residing in the Datasets workspace and so would like the deployment to push all datasets from the repo not just one. How would I achieve that currently? I may be misunderstanding.
e.g. we have some datasets shared across business units but also data marts for different business unit and so different star schemas / data models for each purpose built data mart
I'd personally create a dedicated repo for each dataset in that case. Presumably, those datasets are developed independently? CI/CD pipelines generally have change triggers. I wouldn't want to have a deployment of Datasets B and C triggered if only Dataset A had a change. In the scenario where all datasets reside in the same repository and share the same pipeline that would be the consequence.
Ok thank you for clarifying. In my use case, we have 3 workspaces for our Shared Datasets for Datasets [Dev], [UAT], and [Prod]. We have multiple shared datasets residing in the Datasets workspace and so would like the deployment to push all datasets from the repo not just one. How would I achieve that currently? I may be misunderstanding. e.g. we have some datasets shared across business units but also data marts for different business unit and so different star schemas / data models for each purpose built data mart
I'd personally create a dedicated repo for each dataset in that case. Presumably, those datasets are developed independently? CI/CD pipelines generally have change triggers. I wouldn't want to have a deployment of Datasets B and C triggered if only Dataset A had a change. In the scenario where all datasets reside in the same repository and share the same pipeline that would be the consequence.
I'm not sure that creating a separate repo for every dataset is the right solution. I prefer working out of one repo for a given team. I'm not sure how it would work for Power BI, but there are benefits to scheduling daily and/or weekly entire deployments and creating tests on that for ongoing quality control in the data analytics space. It is a common practice.
This is no issue for direct query datasets and thin reports to redeploy completely since the source control environment is the version of truth. Obviously import models / thick reports would need some additional care to make sure it was metadata only updates.
I think the right solution possibly is to use Azure Devops (or tool of choice) to orchestrate the deployment of various artifact groups. Also, it doesn't look like it will be too difficult to have a conditional script that only runs if files within the given folder (dataset or report workspace) have changed. This should just be a simple git command I would think.
I now have a report and a datasets folder in my repo. I have just set up 1 deployment script for each dataset within the dataset folder. Then I have 1 deployment script for each report workspace that just deploys them all per. So I have multiple rounds of deployment in the one yaml deployment / repo that is triggered.
I'm starting to see the light on the separate manifests though. I'm guessing you can have a main manifest that handles general config and then for dataset specific config can handle those in those folders. This makes sense to me since just having one could get very bloated.