datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Database profiling allow_deny_patterns : Table deny does not work.

Open rsontam-tc opened this issue 2 years ago • 2 comments

Profiling allow_deny_patterns do not seem to work for deny in PostgreSQL database, data is profiled and shown on the stats page.

You may follow the steps below to reproduce the issue where table level deny is not working

Goal is to ingest the table but not profile it.

Table creation statement Create multiple table with suffixes _2 ... _5

CREATE TABLE IF NOT EXISTS public.test_to_exclude_table_from_datahub
(
    id integer NOT NULL,
    val character varying(255) COLLATE pg_catalog."default" NOT NULL,
    time_stamp timestamp without time zone NOT NULL DEFAULT clock_timestamp(),
    CONSTRAINT test_to_exclude_table_from_datahub_pkey PRIMARY KEY (id)
);

ALTER TABLE IF EXISTS public.test_to_exclude_table_from_datahub
    OWNER to postgres;

Populate some data

insert into public.test_to_exclude_table_from_datahub
select
	a.n
	,cast(concat(cast('val -' as varchar), to_char(a.n, '099999')) as varchar(255))
	,now()
from generate_series(40001,50000) as a(n)

Use profile pattern as below

    profiling:
      enabled: true # default false
      ...
      allow_deny_patterns:
        # allow: 
        #  - .*
        deny:
          - 'dvdrental.public.test_to_exclude_table_from_datahub*'
        ignoreCase: True
        alphabet: '[A-Za-z0-9 .-]'

Expected behaviour Expected to see table ingested but not profiled .

Observed behaviour Data Hub shows tables ingested and profiled

image

rsontam-tc avatar Aug 08 '22 21:08 rsontam-tc

FYI https://github.com/datahub-project/datahub/blob/33339e2c8933bb3b989b4052ed1b3d308624f2a0/metadata-ingestion/src/datahub/ingestion/source/ge_profiling_config.py#L80 Thanks!

rsontam-tc avatar Aug 08 '22 21:08 rsontam-tc

@rsontam-tc thanks for calling this out - looks like our docs are incorrect

You should use profile_pattern instead of the profiling.allow_deny_patterns option.

hsheth2 avatar Aug 08 '22 23:08 hsheth2

This is covered in https://github.com/datahub-project/datahub/issues/5590. Closing this bug.

rsontam-tc avatar Aug 18 '22 15:08 rsontam-tc