amazon-redshift-utils icon indicating copy to clipboard operation
amazon-redshift-utils copied to clipboard

Finding dependencies of materialized views

Open redthor opened this issue 4 years ago • 6 comments

Hi there.

The v_view_dependency script: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_view_dependency.sql#L1 does not work for materialized views.

I had a table that would not drop without 'cascade'. I could not find a dependency via the view. Dropping the table I discovered a materialized view was dropped. It would be useful if we could use the v_view_dependency view for materialized views.

thanks

redthor avatar Jun 05 '20 12:06 redthor

It appears that all the views, find_depend and admin views for constraint and view dependency fail to list the source schema and table when it comes to materialized views. Here's an example:

  1. Created table public.test1
  2. Created schema private
  3. Create materialized view private.test1_pmv as select * from public.test1
  4. Queried find_depend (https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_TABLE.html) and v_view_dependency. Following were the results:

dev=# select * from v_view_dependency where dependent_objectname='test1_pmv'; src_oid | src_schemaname | src_objectname | dependent_viewoid | dependent _schemaname | dependent_objectname ---------+----------------+----------------------+-------------------+---------- ------------+---------------------- 329361 | private | mv_tbl__test1_pmv__0 | 329364 | private
| test1_pmv (1 row)

dev=# select * from find_depend where refbyname='test1_pmv'; tbloid | schemaname | name | refbyschemaname | refbyname | viewoid --------+------------+----------------------+-----------------+-----------+--------- 376 | pg_catalog | pg_xactlock | private | test1_pmv | 329364 329361 | private | mv_tbl__test1_pmv__0 | private | test1_pmv | 329364 329364 | private | test1_pmv | private | test1_pmv | 329364 (3 rows)

As evident above, the views fail to list public.test1 as the source schema/object. We probably need modification to the existing scripts to account for such scenarios? It looks like the only way to check for mv dependencies is to look at the view definition... A direct query also work:

select oid, relname from pg_class where oid in (select objid from pg_depend where refobjid = <source table OID>);

vitdexoff avatar Jun 12 '20 12:06 vitdexoff

While this has not been fixed. Anyone who makes it here may wish to look at https://stackoverflow.com/a/62337897/11395802 for a way to determine if a materialized view has the desired table in its definition.

select schemaname, viewname from pg_views where schemaname not like 'pg_catalog' and schemaname not like 'information_schema' and definition like '%<tablename>%';

jallender avatar Sep 29 '20 18:09 jallender

Hi, I stumbled upon the same issue. @jallender answer is good if you need to know if there are any dependencies for a specific table but I needed the dependencies for any table in one schema. I ended up with this solution:

create view admin.v_view_dependencies_fixed as (
  with h1 as (
      select generate_series as i
      from generate_series(1, 100) -- we can use this since the query only touches the leader node
  ),
  h2 as (
      select schemaname                                         as dependent_schema,
             viewname                                           as dependent_view,
             schemaname || '.' || viewname                      as dependent_full,
             regexp_substr(definition, 'schema_name\\.\\w+', 1, i) as dependency
      from pg_views
               cross join h1
      where schemaname = 'schema_name'
        and dependency is not null
        and dependency != ''
  )
  select distinct
         dependent_full,
         dependent_schema,
         dependent_view,
         dependency as source_full,
         split_part(dependency, '.', 1) as source_schema,
         split_part(dependency, '.', 2) as source_object
  from h2
  where dependent_full != source_full
);

lennertr avatar Jul 28 '21 09:07 lennertr

I expanded on the previous commenter's view definition to handle a few extra things:

  • Some of the materialized views in my Redshift cluster take advantage of a custom search_path so not all table references included a schema qualifier. For this view, I added some logic to search not only for (probably) column definitions but also for (possibly unqualified) relation references in from and join clauses too.
  • This version searches all MV definitions for any references to any non-system catalog tables, rather than requiring a hardcoded reference to a single schema.
  • I dropped the generate_series piece since I felt any single reference to any of the relevant schemas, tables, or columns was sufficient for my use case.
create view admin.v_mv_dependency as (
    with schemas as (
        -- select all non-system, non-administrative schemas
        select nspname
        from pg_namespace
        where nspname not like 'pg_%'
          and nspname not in ('information_schema', 'admin')
    ),
    schema_rels as (
        -- select all tables and views from all eligible schemas
        (
            select nspname as schemaname,
                   pgv.viewname as relname,
                   'view' as relkind
            from schemas
            join pg_views pgv on (schemas.nspname = pgv.schemaname)
        )
        union
        (
            select nspname as schemaname,
                   pgt.tablename as relname,
                   'table' as relkind
            from schemas
            join pg_tables pgt on (schemas.nspname = pgt.schemaname)
        )
    ),
    mviews as (
        -- select all materialized view definitions from the MV list
        select schemaname,
               viewname,
               definition
        from pg_views
        -- not exactly foolproof, but should be pretty decent at catching only MVs
        where definition ilike '%create materialized view%'
    ),
    possible_mview_dep_matches as (
        -- compute a list of possible matches just using simple substring heuristic
        -- on either schema name or relation (view/table) name
        select mvs.schemaname,
               mvs.viewname,
               any_value(mvs.definition) as definition,
               src.schemaname as objschema,
               relname as objname,
               any_value(relkind) as objkind
        from mviews mvs
            cross join schema_rels src
        where objname is not null
          and (strpos(mvs.definition, src.schemaname) != 0 or
               strpos(mvs.definition, src.relname) != 0)
        group by 1, 2, 4, 5
    ),
    mv_ref_matches as (
        (
            -- from the set of possible matches, find any possible table name matches
            -- by checking for FROM {table} and JOIN {table} within the view definition
            select vw.schemaname                       as dependent_schema,
                   vw.viewname                         as dependent_view,
                   vw.schemaname || '.' || vw.viewname as dependent_full,
                   vw.objschema                        as source_schema,
                   vw.objname                          as source_object,
                   vw.objkind                          as source_kind,
                   regexp_substr(vw.definition,
                                 '(from|join)[[:blank:]]+(' || vw.objschema || '\\.)?(' || vw.objname || ')', 1, 1,
                                 'i')                  as dependency_match,
                   'relation'                          as match_kind
            from possible_mview_dep_matches as vw
            where dependency_match is not null
              and dependency_match != ''
        )
        union
        (
            -- from the set of possible matches, find any possible column name matches
            -- by checking for ({schema}.)?{table}.{column}
            select vw.schemaname                                                                     as dependent_schema,
                   vw.viewname                                                                       as dependent_view,
                   vw.schemaname || '.' || vw.viewname                                               as dependent_full,
                   vw.objschema                                                                      as source_schema,
                   vw.objname                                                                        as source_object,
                   vw.objkind                                                                        as source_kind,
                   regexp_substr(vw.definition,
                                 '(' || vw.objschema || '\\.)?(' || vw.objname || ')\\.\\w+', 1, 1,
                                 'i')                                                                as dependency_match,
                   'column' as match_kind
            from possible_mview_dep_matches as vw
            where dependency_match is not null
              and dependency_match != ''
        )
    )
    select distinct dependent_full,
                    dependent_schema,
                    dependent_view,
                    dependency_match as source_full,
                    source_schema,
                    source_object,
                    source_kind,
                    match_kind
    from mv_ref_matches
    where dependent_full != source_full
);

There are some limitations with my approach which may make it unacceptable for some users:

  • The search is (more) exhaustive than the previous approach since it cross joins all materialized views with every (schema, table) combination, but this means it could take considerably longer. In my database, we only have around 15-20 materialized views and a couple hundred tables across several schemas so this ran in an acceptable timeframe.
    • I tried to reduce runtime by doing some super basic strpos checks before running the more complex regular expression over the view defintiion, but I suspect this won't do much for a database with either a high number of MVs or tables.
  • This pattern cannot account for columns select *, but hopefully it can at least help with identifying tables using the (from|join) heuristic. Sadly, we still have to use regular expressions rather than pg_depend but 🤷

chrisrink10 avatar Aug 11 '21 13:08 chrisrink10

@chrisrink10 thank you for putting this together!

nickthorpe avatar Aug 11 '21 13:08 nickthorpe

@chrisrink10 Thanks.

I have enhanced the regex for 'relation'. My regex skills are not perfect so feel free to check these before using...

  • prevent matching on partial names. e.g. schema.mytable should not match schema.mytable but not schema.mytable2
  • consider double quote identifiers
  • nuance for end of lines

'(from|join)([[:blank:]]|\s)+((")?' || vw.objschema || '(")?\.)?((")?' || vw.objname || '(")?)([[:blank:]]|\s)'

Also looked at the 'column' regex.

  • Added the quote identifier consideration
  • Moved the final \. inside of object name group

'((")?' || vw.objschema || '(")?\.)((")?' || vw.objname || '(")?\.)\w+'

Cannot say that I have tested all scenarios - but working for me so far ....

simonB2020 avatar Jul 04 '23 08:07 simonB2020