scenic
scenic copied to clipboard
refresh with cascade: true - refreshing unrelated materialized views
Hi,
There is a flaw in the Postgres Adapter DependencyParser class to find out which mat views are dependencies of the mat view that is being refreshed. It results in unrelated mat views being refreshed. The reason is tsort(dependency_hash) puts all mat views in the db linearly in an array including ones that have no relation to the mat view being refreshed.
Take this example: A depends on B which depends on C E depends on D which depends on C
No matter what you do, A will either be before E in the array or after it: example #1 [C,B,A,D,E] example #2 [C,D,E,B,A]
If I refresh (cascade: true) mat view E it will also unnecessarily refresh B then A in example #1. If I refresh (cascade: true) mat view A it will also unnecessarily refresh E then D in example #2.
TL;DR - Forcing all mat views in the db into a single linear dependency array will cause unnecessary (and costly) refreshes of unrelated mat views.
EDIT: This problem is not only about mat views having a common 'ancestry'. It also applies to situations with no common 'ancestry':
A depends on B which depends on C F depends on E which depends on D
tsort(dependency_hash) might put all mat views in a single array that will look like the following:
[C,B,A,D,E,F]
Refreshing F with cascade:true will result in all mat views being refreshed.
Any update on this issue?
Any update on this issue?
Well tested patches are welcome
François says: Apparently, Scenic's algorithm for retrieving dependent materialized views cannot be trusted: https://github.com/scenic-views/scenic/issues/275
@yonibaciu
Take this example: A depends on B which depends on C E depends on D which depends on C
If I refresh (cascade: true) mat view E it will also unnecessarily refresh B then A If I refresh (cascade: true) mat view A it will also unnecessarily refresh E then D
There's an argument this is correct, is there not? If we're refreshing e
, then we have to refresh it's direct decendent d
and it's direct descendent c
. It sounds like you're expecting it to stop there. However, we know that b
depends on c
. If We know c
has been refreshed then it stands to reason that b
may change. If we refresh b
then we have to refresh a
for the same reason.
I can see an argument that we should only refresh down the tree so to speak, but I think that would be more surprising to users who know they did something to change e
, so request for it to be refreshed and are surprised a
and b
are now stale.
This problem is not only about mat views having a common 'ancestry'. It also applies to situations with no common 'ancestry':
A depends on B which depends on C F depends on E which depends on D
Refreshing F with cascade:true will result in all mat views being refreshed
This is definitely a bug.
@yonibaciu
Take this example: A depends on B which depends on C E depends on D which depends on C
If I refresh (cascade: true) mat view E it will also unnecessarily refresh B then A If I refresh (cascade: true) mat view A it will also unnecessarily refresh E then D
There's an argument this is correct, is there not? If we're refreshing
e
, then we have to refresh it's direct decendentd
and it's direct descendentc
. It sounds like you're expecting it to stop there. However, we know thatb
depends onc
. If We knowc
has been refreshed then it stands to reason thatb
may change. If we refreshb
then we have to refresha
for the same reason.I can see an argument that we should only refresh down the tree so to speak, but I think that would be more surprising to users who know they did something to change
e
, so request for it to be refreshed and are surpriseda
andb
are now stale.
I respectfully disagree. "cascade" usually means uni-directional, meaning down the tree (or up the tree if you prefer to look at it that way). I would consider another flag for the behavior you are describing.
Perhaps there should be three: "upstream" (the current cascade), "downstream" and "both" where "both" is the behavior you are describing.
@derekprior
here is code to refresh without affecting unnecessary mat views. It also includes refresh_all
and refresh_list
that refresh multiple mat views, maintaining dependency order and making sure a mat view is only refreshed once.
class MatViewRefresher
DEPENDENCY_SQL = <<-SQL
SELECT materialized_view, string_agg(depends_on,',') AS depends_on
FROM (
SELECT DISTINCT class_for_rewrite.relname AS materialized_view, class_for_depend.relname AS depends_on
FROM pg_rewrite AS rewrite
JOIN pg_class AS class_for_rewrite ON rewrite.ev_class = class_for_rewrite.oid
JOIN pg_namespace AS rewrite_namespace ON rewrite_namespace.oid = class_for_rewrite.relnamespace
JOIN pg_depend AS depend ON rewrite.oid = depend.objid
JOIN pg_class AS class_for_depend ON depend.refobjid = class_for_depend.oid
JOIN pg_namespace AS depend_namespace ON depend_namespace.oid = class_for_depend.relnamespace
WHERE class_for_depend.relkind = 'm'
AND rewrite_namespace.nspname = 'public'
AND class_for_rewrite.relkind = 'm'
AND class_for_depend.relname != class_for_rewrite.relname
) a
GROUP BY materialized_view
ORDER BY materialized_view;
SQL
def refresh(mat_view, concurrently: false, cascade: false, logger: Logger.new(STDOUT))
@dependencies = ActiveRecord::Base.connection.select_all(DEPENDENCY_SQL)
@already_refreshed = []
if cascade
refresh_recursive(mat_view, concurrently: concurrently, logger: logger)
else
refresh_simple(mat_view, concurrently: concurrently, logger: logger)
end
end
def refresh_all(concurrently: false, logger: Logger.new(STDOUT))
@dependencies = ActiveRecord::Base.connection.select_all(DEPENDENCY_SQL)
@already_refreshed = []
list_mat_views = ActiveRecord::Base.connection.select_values("SELECT oid::regclass::text FROM pg_class WHERE relkind = 'm'")
list_mat_views.each do |mat_view|
refresh_recursive(mat_view, concurrently: concurrently, logger: logger)
end
end
def refresh_list(list_mat_views, concurrently: false, logger: Logger.new(STDOUT))
@dependencies = ActiveRecord::Base.connection.select_all(DEPENDENCY_SQL)
@already_refreshed = []
list_mat_views.each do |mat_view|
refresh_recursive(mat_view, concurrently: concurrently, logger: logger)
end
end
private
def refresh_recursive(mat_view, concurrently: false, logger:)
depends_on_row = @dependencies.detect { |row| row['materialized_view'] == mat_view }
depends_on_mat_views = []
if depends_on_row
depends_on_mat_views = depends_on_row['depends_on'].split(',')
depends_on_mat_views.each do |depends_on_mat_view|
refresh_recursive(depends_on_mat_view, concurrently: concurrently, logger: logger)
end
end
unless @already_refreshed.include?(mat_view)
refresh_simple(mat_view, concurrently: concurrently, logger: logger)
@already_refreshed << mat_view
end
end
def refresh_simple(mat_view, concurrently: false, logger:)
logger.info("Refreshing the #{mat_view} view #{concurrently ? 'concurrently' : 'not concurrently'}") unless Rails.env.test?
ActiveRecord::Base.connection.execute("REFRESH MATERIALIZED VIEW #{concurrently ? 'CONCURRENTLY' : ''} #{mat_view}")
end
end