amazon-redshift-utils
amazon-redshift-utils copied to clipboard
Finding dependencies of materialized views
Hi there.
The v_view_dependency script: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_view_dependency.sql#L1 does not work for materialized views.
I had a table that would not drop without 'cascade'. I could not find a dependency via the view. Dropping the table I discovered a materialized view was dropped.
It would be useful if we could use the v_view_dependency
view for materialized views.
thanks
It appears that all the views, find_depend and admin views for constraint and view dependency fail to list the source schema and table when it comes to materialized views. Here's an example:
- Created table public.test1
- Created schema private
- Create materialized view private.test1_pmv as select * from public.test1
- Queried find_depend (https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_TABLE.html) and v_view_dependency. Following were the results:
dev=# select * from v_view_dependency where dependent_objectname='test1_pmv';
src_oid | src_schemaname | src_objectname | dependent_viewoid | dependent
_schemaname | dependent_objectname
---------+----------------+----------------------+-------------------+----------
------------+----------------------
329361 | private | mv_tbl__test1_pmv__0 | 329364 | private
| test1_pmv
(1 row)
dev=# select * from find_depend where refbyname='test1_pmv'; tbloid | schemaname | name | refbyschemaname | refbyname | viewoid --------+------------+----------------------+-----------------+-----------+--------- 376 | pg_catalog | pg_xactlock | private | test1_pmv | 329364 329361 | private | mv_tbl__test1_pmv__0 | private | test1_pmv | 329364 329364 | private | test1_pmv | private | test1_pmv | 329364 (3 rows)
As evident above, the views fail to list public.test1 as the source schema/object. We probably need modification to the existing scripts to account for such scenarios? It looks like the only way to check for mv dependencies is to look at the view definition... A direct query also work:
select oid, relname from pg_class where oid in (select objid from pg_depend where refobjid = <source table OID>);
While this has not been fixed. Anyone who makes it here may wish to look at https://stackoverflow.com/a/62337897/11395802 for a way to determine if a materialized view has the desired table in its definition.
select schemaname, viewname from pg_views where schemaname not like 'pg_catalog' and schemaname not like 'information_schema' and definition like '%<tablename>%';
Hi, I stumbled upon the same issue. @jallender answer is good if you need to know if there are any dependencies for a specific table but I needed the dependencies for any table in one schema. I ended up with this solution:
create view admin.v_view_dependencies_fixed as (
with h1 as (
select generate_series as i
from generate_series(1, 100) -- we can use this since the query only touches the leader node
),
h2 as (
select schemaname as dependent_schema,
viewname as dependent_view,
schemaname || '.' || viewname as dependent_full,
regexp_substr(definition, 'schema_name\\.\\w+', 1, i) as dependency
from pg_views
cross join h1
where schemaname = 'schema_name'
and dependency is not null
and dependency != ''
)
select distinct
dependent_full,
dependent_schema,
dependent_view,
dependency as source_full,
split_part(dependency, '.', 1) as source_schema,
split_part(dependency, '.', 2) as source_object
from h2
where dependent_full != source_full
);
I expanded on the previous commenter's view definition to handle a few extra things:
- Some of the materialized views in my Redshift cluster take advantage of a custom
search_path
so not all table references included a schema qualifier. For this view, I added some logic to search not only for (probably) column definitions but also for (possibly unqualified) relation references infrom
andjoin
clauses too. - This version searches all MV definitions for any references to any non-system catalog tables, rather than requiring a hardcoded reference to a single schema.
- I dropped the
generate_series
piece since I felt any single reference to any of the relevant schemas, tables, or columns was sufficient for my use case.
create view admin.v_mv_dependency as (
with schemas as (
-- select all non-system, non-administrative schemas
select nspname
from pg_namespace
where nspname not like 'pg_%'
and nspname not in ('information_schema', 'admin')
),
schema_rels as (
-- select all tables and views from all eligible schemas
(
select nspname as schemaname,
pgv.viewname as relname,
'view' as relkind
from schemas
join pg_views pgv on (schemas.nspname = pgv.schemaname)
)
union
(
select nspname as schemaname,
pgt.tablename as relname,
'table' as relkind
from schemas
join pg_tables pgt on (schemas.nspname = pgt.schemaname)
)
),
mviews as (
-- select all materialized view definitions from the MV list
select schemaname,
viewname,
definition
from pg_views
-- not exactly foolproof, but should be pretty decent at catching only MVs
where definition ilike '%create materialized view%'
),
possible_mview_dep_matches as (
-- compute a list of possible matches just using simple substring heuristic
-- on either schema name or relation (view/table) name
select mvs.schemaname,
mvs.viewname,
any_value(mvs.definition) as definition,
src.schemaname as objschema,
relname as objname,
any_value(relkind) as objkind
from mviews mvs
cross join schema_rels src
where objname is not null
and (strpos(mvs.definition, src.schemaname) != 0 or
strpos(mvs.definition, src.relname) != 0)
group by 1, 2, 4, 5
),
mv_ref_matches as (
(
-- from the set of possible matches, find any possible table name matches
-- by checking for FROM {table} and JOIN {table} within the view definition
select vw.schemaname as dependent_schema,
vw.viewname as dependent_view,
vw.schemaname || '.' || vw.viewname as dependent_full,
vw.objschema as source_schema,
vw.objname as source_object,
vw.objkind as source_kind,
regexp_substr(vw.definition,
'(from|join)[[:blank:]]+(' || vw.objschema || '\\.)?(' || vw.objname || ')', 1, 1,
'i') as dependency_match,
'relation' as match_kind
from possible_mview_dep_matches as vw
where dependency_match is not null
and dependency_match != ''
)
union
(
-- from the set of possible matches, find any possible column name matches
-- by checking for ({schema}.)?{table}.{column}
select vw.schemaname as dependent_schema,
vw.viewname as dependent_view,
vw.schemaname || '.' || vw.viewname as dependent_full,
vw.objschema as source_schema,
vw.objname as source_object,
vw.objkind as source_kind,
regexp_substr(vw.definition,
'(' || vw.objschema || '\\.)?(' || vw.objname || ')\\.\\w+', 1, 1,
'i') as dependency_match,
'column' as match_kind
from possible_mview_dep_matches as vw
where dependency_match is not null
and dependency_match != ''
)
)
select distinct dependent_full,
dependent_schema,
dependent_view,
dependency_match as source_full,
source_schema,
source_object,
source_kind,
match_kind
from mv_ref_matches
where dependent_full != source_full
);
There are some limitations with my approach which may make it unacceptable for some users:
- The search is (more) exhaustive than the previous approach since it cross joins all materialized views with every (schema, table) combination, but this means it could take considerably longer. In my database, we only have around 15-20 materialized views and a couple hundred tables across several schemas so this ran in an acceptable timeframe.
- I tried to reduce runtime by doing some super basic
strpos
checks before running the more complex regular expression over the view defintiion, but I suspect this won't do much for a database with either a high number of MVs or tables.
- I tried to reduce runtime by doing some super basic
- This pattern cannot account for columns
select *
, but hopefully it can at least help with identifying tables using the(from|join)
heuristic. Sadly, we still have to use regular expressions rather thanpg_depend
but 🤷
@chrisrink10 thank you for putting this together!
@chrisrink10 Thanks.
I have enhanced the regex for 'relation'. My regex skills are not perfect so feel free to check these before using...
- prevent matching on partial names. e.g. schema.mytable should not match schema.mytable but not schema.mytable2
- consider double quote identifiers
- nuance for end of lines
'(from|join)([[:blank:]]|\s)+((")?' || vw.objschema || '(")?\.)?((")?' || vw.objname || '(")?)([[:blank:]]|\s)'
Also looked at the 'column' regex.
- Added the quote identifier consideration
- Moved the final \. inside of object name group
'((")?' || vw.objschema || '(")?\.)((")?' || vw.objname || '(")?\.)\w+'
Cannot say that I have tested all scenarios - but working for me so far ....