scenic icon indicating copy to clipboard operation
scenic copied to clipboard

refresh with cascade: true - refreshing unrelated materialized views

Open yonibaciu opened this issue 6 years ago • 7 comments

Hi,

There is a flaw in the Postgres Adapter DependencyParser class to find out which mat views are dependencies of the mat view that is being refreshed. It results in unrelated mat views being refreshed. The reason is tsort(dependency_hash) puts all mat views in the db linearly in an array including ones that have no relation to the mat view being refreshed.

Take this example: A depends on B which depends on C E depends on D which depends on C

No matter what you do, A will either be before E in the array or after it: example #1 [C,B,A,D,E] example #2 [C,D,E,B,A]

If I refresh (cascade: true) mat view E it will also unnecessarily refresh B then A in example #1. If I refresh (cascade: true) mat view A it will also unnecessarily refresh E then D in example #2.

TL;DR - Forcing all mat views in the db into a single linear dependency array will cause unnecessary (and costly) refreshes of unrelated mat views.

EDIT: This problem is not only about mat views having a common 'ancestry'. It also applies to situations with no common 'ancestry':

A depends on B which depends on C F depends on E which depends on D

tsort(dependency_hash) might put all mat views in a single array that will look like the following:

[C,B,A,D,E,F]

Refreshing F with cascade:true will result in all mat views being refreshed.

yonibaciu avatar Feb 15 '19 20:02 yonibaciu

Any update on this issue?

francois-belle avatar Jan 21 '20 13:01 francois-belle

Any update on this issue?

Well tested patches are welcome

derekprior avatar Feb 02 '20 22:02 derekprior

François says: Apparently, Scenic's algorithm for retrieving dependent materialized views cannot be trusted: https://github.com/scenic-views/scenic/issues/275

jgigault avatar Feb 24 '20 06:02 jgigault

@yonibaciu

Take this example: A depends on B which depends on C E depends on D which depends on C

If I refresh (cascade: true) mat view E it will also unnecessarily refresh B then A If I refresh (cascade: true) mat view A it will also unnecessarily refresh E then D

There's an argument this is correct, is there not? If we're refreshing e, then we have to refresh it's direct decendent d and it's direct descendent c. It sounds like you're expecting it to stop there. However, we know that b depends on c. If We know c has been refreshed then it stands to reason that b may change. If we refresh b then we have to refresh a for the same reason.

I can see an argument that we should only refresh down the tree so to speak, but I think that would be more surprising to users who know they did something to change e, so request for it to be refreshed and are surprised a and b are now stale.

derekprior avatar Dec 18 '21 21:12 derekprior

This problem is not only about mat views having a common 'ancestry'. It also applies to situations with no common 'ancestry':

A depends on B which depends on C F depends on E which depends on D

Refreshing F with cascade:true will result in all mat views being refreshed

This is definitely a bug.

derekprior avatar Dec 18 '21 21:12 derekprior

@yonibaciu

Take this example: A depends on B which depends on C E depends on D which depends on C

If I refresh (cascade: true) mat view E it will also unnecessarily refresh B then A If I refresh (cascade: true) mat view A it will also unnecessarily refresh E then D

There's an argument this is correct, is there not? If we're refreshing e, then we have to refresh it's direct decendent d and it's direct descendent c. It sounds like you're expecting it to stop there. However, we know that b depends on c. If We know c has been refreshed then it stands to reason that b may change. If we refresh b then we have to refresh a for the same reason.

I can see an argument that we should only refresh down the tree so to speak, but I think that would be more surprising to users who know they did something to change e, so request for it to be refreshed and are surprised a and b are now stale.

I respectfully disagree. "cascade" usually means uni-directional, meaning down the tree (or up the tree if you prefer to look at it that way). I would consider another flag for the behavior you are describing.

Perhaps there should be three: "upstream" (the current cascade), "downstream" and "both" where "both" is the behavior you are describing.

yonibaciu avatar May 23 '22 18:05 yonibaciu

@derekprior

here is code to refresh without affecting unnecessary mat views. It also includes refresh_all and refresh_list that refresh multiple mat views, maintaining dependency order and making sure a mat view is only refreshed once.

  class MatViewRefresher
    DEPENDENCY_SQL = <<-SQL
      SELECT materialized_view, string_agg(depends_on,',') AS depends_on
      FROM (
        SELECT DISTINCT class_for_rewrite.relname AS materialized_view, class_for_depend.relname AS depends_on
        FROM pg_rewrite AS rewrite
        JOIN pg_class AS class_for_rewrite ON rewrite.ev_class = class_for_rewrite.oid
        JOIN pg_namespace AS rewrite_namespace ON rewrite_namespace.oid = class_for_rewrite.relnamespace
        JOIN pg_depend AS depend ON rewrite.oid = depend.objid
        JOIN pg_class AS class_for_depend ON depend.refobjid = class_for_depend.oid
        JOIN pg_namespace AS depend_namespace ON depend_namespace.oid = class_for_depend.relnamespace
        WHERE class_for_depend.relkind = 'm'
          AND rewrite_namespace.nspname = 'public'
          AND class_for_rewrite.relkind = 'm'
          AND class_for_depend.relname != class_for_rewrite.relname
      ) a
      GROUP BY materialized_view
      ORDER BY materialized_view;
    SQL

    def refresh(mat_view, concurrently: false, cascade: false, logger: Logger.new(STDOUT))
      @dependencies = ActiveRecord::Base.connection.select_all(DEPENDENCY_SQL)
      @already_refreshed = []

      if cascade
        refresh_recursive(mat_view, concurrently: concurrently, logger: logger)
      else
        refresh_simple(mat_view, concurrently: concurrently, logger: logger)
      end
    end

    def refresh_all(concurrently: false, logger: Logger.new(STDOUT))
      @dependencies = ActiveRecord::Base.connection.select_all(DEPENDENCY_SQL)
      @already_refreshed = []

      list_mat_views = ActiveRecord::Base.connection.select_values("SELECT oid::regclass::text FROM pg_class WHERE relkind = 'm'")
      list_mat_views.each do |mat_view|
        refresh_recursive(mat_view, concurrently: concurrently, logger: logger)
      end
    end

    def refresh_list(list_mat_views, concurrently: false, logger: Logger.new(STDOUT))
      @dependencies = ActiveRecord::Base.connection.select_all(DEPENDENCY_SQL)
      @already_refreshed = []

      list_mat_views.each do |mat_view|
        refresh_recursive(mat_view, concurrently: concurrently, logger: logger)
      end
    end

    private

    def refresh_recursive(mat_view, concurrently: false, logger:)
      depends_on_row = @dependencies.detect { |row| row['materialized_view'] == mat_view }
      depends_on_mat_views = []
      if depends_on_row
        depends_on_mat_views = depends_on_row['depends_on'].split(',')
        depends_on_mat_views.each do |depends_on_mat_view|
          refresh_recursive(depends_on_mat_view, concurrently: concurrently, logger: logger)
        end
      end
      unless @already_refreshed.include?(mat_view)
        refresh_simple(mat_view, concurrently: concurrently, logger: logger)
        @already_refreshed << mat_view
      end
    end

    def refresh_simple(mat_view, concurrently: false, logger:)
      logger.info("Refreshing the #{mat_view} view #{concurrently ? 'concurrently' : 'not concurrently'}") unless Rails.env.test?
      ActiveRecord::Base.connection.execute("REFRESH MATERIALIZED VIEW #{concurrently ? 'CONCURRENTLY' : ''} #{mat_view}")
    end
  end

yonibaciu avatar May 23 '22 18:05 yonibaciu