data-landing-zone
data-landing-zone copied to clipboard
Databricks table scan for purview capability + ARM templates for ADF deployment
This PR includes:
- Databricks notebook to scan tables and push to Purview
- ARM templates for ADF pipeline to orchestrate the running of this Databricks notebook - some values in the template need specification
@marvinbuss all checks are cleared from a schema validation standpoint.
A few things that need your input/support:
(1) The new data factory has these seven parameters:
- tenantId
- purviewClientId
- purviewAccountName
- dataLandingZoneName
- databricksWorkspaceUrl
- purviewSecretPath
- databricksAccessToken
We need to configure the configs and yml files to enable these.
(2) The KeyVault to store the secrets needs configuration.
(3) The path to the Databricks Notebook needs to be updated in the deployment template (line 278)
(4) There is a need for a service principal with read access to Purview which will be used in parameters 2 and 6.
FYI - Hive Connector is in Preview and supports also Databricks Metastore. https://docs.microsoft.com/en-us/azure/purview/register-scan-hive-metastore-source
@renepajta That is the reason why this PR was not merged. We probably have to work on some automation for this rather soon.