datahub
datahub copied to clipboard
Database profiling allow_deny_patterns : Table deny does not work.
Profiling allow_deny_patterns do not seem to work for deny in PostgreSQL database, data is profiled and shown on the stats page.
You may follow the steps below to reproduce the issue where table level deny is not working
Goal is to ingest the table but not profile it.
Table creation statement Create multiple table with suffixes _2 ... _5
CREATE TABLE IF NOT EXISTS public.test_to_exclude_table_from_datahub
(
id integer NOT NULL,
val character varying(255) COLLATE pg_catalog."default" NOT NULL,
time_stamp timestamp without time zone NOT NULL DEFAULT clock_timestamp(),
CONSTRAINT test_to_exclude_table_from_datahub_pkey PRIMARY KEY (id)
);
ALTER TABLE IF EXISTS public.test_to_exclude_table_from_datahub
OWNER to postgres;
Populate some data
insert into public.test_to_exclude_table_from_datahub
select
a.n
,cast(concat(cast('val -' as varchar), to_char(a.n, '099999')) as varchar(255))
,now()
from generate_series(40001,50000) as a(n)
Use profile pattern as below
profiling:
enabled: true # default false
...
allow_deny_patterns:
# allow:
# - .*
deny:
- 'dvdrental.public.test_to_exclude_table_from_datahub*'
ignoreCase: True
alphabet: '[A-Za-z0-9 .-]'
Expected behaviour Expected to see table ingested but not profiled .
Observed behaviour Data Hub shows tables ingested and profiled

FYI https://github.com/datahub-project/datahub/blob/33339e2c8933bb3b989b4052ed1b3d308624f2a0/metadata-ingestion/src/datahub/ingestion/source/ge_profiling_config.py#L80 Thanks!
@rsontam-tc thanks for calling this out - looks like our docs are incorrect
You should use profile_pattern instead of the profiling.allow_deny_patterns option.
This is covered in https://github.com/datahub-project/datahub/issues/5590. Closing this bug.