data-landing-zone icon indicating copy to clipboard operation
data-landing-zone copied to clipboard

Databricks table scan for purview capability + ARM templates for ADF deployment

Open abdale opened this issue 4 years ago • 3 comments

This PR includes:

  • Databricks notebook to scan tables and push to Purview
  • ARM templates for ADF pipeline to orchestrate the running of this Databricks notebook - some values in the template need specification

abdale avatar Feb 23 '21 19:02 abdale

@marvinbuss all checks are cleared from a schema validation standpoint.

A few things that need your input/support:

(1) The new data factory has these seven parameters:

  1. tenantId
  2. purviewClientId
  3. purviewAccountName
  4. dataLandingZoneName
  5. databricksWorkspaceUrl
  6. purviewSecretPath
  7. databricksAccessToken

We need to configure the configs and yml files to enable these.

(2) The KeyVault to store the secrets needs configuration.

(3) The path to the Databricks Notebook needs to be updated in the deployment template (line 278)

(4) There is a need for a service principal with read access to Purview which will be used in parameters 2 and 6.

abdale avatar Mar 02 '21 23:03 abdale

FYI - Hive Connector is in Preview and supports also Databricks Metastore. https://docs.microsoft.com/en-us/azure/purview/register-scan-hive-metastore-source

renepajta avatar Aug 02 '21 19:08 renepajta

@renepajta That is the reason why this PR was not merged. We probably have to work on some automation for this rather soon.

marvinbuss avatar Aug 03 '21 06:08 marvinbuss