scenic Add topological sorting for dumped views via TSort

Rebase of https://github.com/scenic-views/scenic/pull/337

Oct 25 '23 15:10 calebhearth

Hello! We (me and @thorncp) have been trying this code, but it hasn't worked for us. Here's exactly what we tried:

pin scenic to 1431c49ec6d6063212aa718db73cc6158faa1461
create view as
migrate
This added as view at the top of the views in db/schema (alphabetical sort)
create view bs
migrate
added bs view after as in the db/schema (alphabetical sort)
replace_view as with select * from bs to generate dependent order
- We also tried it with update_view and got the same outcome
migrate
as replaced the view and did not move it below bs

At this point, our schema can't be loaded into the database, because it will try to create as using bs but bs does not yet exist.

rails db:schema:load

ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation "bs" does not exist
LINE 2:      FROM bs;
                  ^
/path/to/db/schema.rb:4416:in `block in <main>'
/path/to/db/schema.rb:13:in `<main>'

Caused by:
PG::UndefinedTable: ERROR:  relation "bs" does not exist
LINE 2:      FROM bs;
                  ^

We also tried this with our existing views. Before we dumped the schema they were in a valid, topological order. After we dumped, they were in an invalid non-topological order.

It this expected? As it stands, our app would break if this got merged.

Nov 17 '23 16:11 edwardloveall

This is not expected. The idea of this PR is to provide both correct ordering (which we have today) and stable ordering (which we do not).

Your manual steps are a good test case for us to consider including in our suite today (that bs must come before as) to prevent regression. I'm a bit surprised we don't have something like that already though...

Nov 19 '23 17:11 derekprior

Thanks for the confirmation.

FWIW, I just wrote a test against this commit and it passes:

it "sorts dependency order over alphabetical order" do
  conn.create_view "as", sql_definition: <<-SQL
    SELECT 1;
  SQL
  conn.create_view "bs", sql_definition: <<-SQL
    SELECT 2;
  SQL
  conn.update_view "as", sql_definition: <<-SQL
    SELECT * FROM bs;
  SQL
  
  expect(views).to eq(%w[bs as])
end

I also tried conn.replace_view but that doesn't take a sql_definition and I didn't research further.

Maybe it's our setup somehow? Maybe there's some untested behavior if you migrate in between creating and updating dependent views?

Nov 19 '23 18:11 edwardloveall

Update: I tried this on a brand new app and it did put bs before as. So I'm thinking it has something to do with our particular app. Hopefully I'll get some more info for you.

Nov 19 '23 22:11 edwardloveall

We (@heatherbooker and I) figured it out today. Our database views are not in the public schema, they exist in a different schema set via the config/database.yml that looks something like this:

production:
  database: app_production
  schema_search_path: "abc"

The SQL that generates the list of dependent views does not include the schema name, but the code to generate view names does include the non-public schema name.

This breaks the logic that prepares the views to be TSorted.

We updated the select portion of the SQL and now our example case passes and puts as below bs:

-      SELECT distinct dependent_ns.nspname AS dependent_schema
-      , dependent_view.relname AS dependent_view
-      , source_ns.nspname AS source_schema
-      , source_table.relname AS source_table
+      SELECT DISTINCT
+        dependent_ns.nspname AS dependent_schema,
+        CASE
+          WHEN dependent_ns.nspname = 'public' THEN dependent_view.relname
+          ELSE (dependent_ns.nspname || '.' || dependent_view.relname)
+        END AS dependent_view,
+        source_ns.nspname AS source_schema,
+        CASE
+          WHEN source_ns.nspname = 'public' THEN source_table.relname
+          ELSE (source_ns.nspname || '.' || source_table.relname)
+        END AS source_table
       FROM pg_depend
       JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
       JOIN pg_class as dependent_view ON pg_rewrite.ev_class = dependent_view.oid
       JOIN pg_class as source_table ON pg_depend.refobjid = source_table.oid
       JOIN pg_namespace dependent_ns ON dependent_ns.oid = dependent_view.relnamespace
       JOIN pg_namespace source_ns ON source_ns.oid = source_table.relnamespace
       WHERE dependent_ns.nspname = ANY (current_schemas(false)) AND source_ns.nspname = ANY (current_schemas(false))
       AND source_table.relname != dependent_view.relname
       AND source_table.relkind IN ('m', 'v') AND dependent_view.relkind IN ('m', 'v')
-      ORDER BY dependent_view.relname;
+      ORDER BY dependent_view;

This should replicate the current scenic logic.

I also created a test, but it's failing when the new test just before it (the test added in this PR on line 152) runs before it. Not sure why:

it "sorts dependency order when views exist in a schema" do
  conn.execute("CREATE SCHEMA IF NOT EXISTS scenic; SET search_path TO scenic")
  conn.execute("CREATE VIEW apples AS SELECT 1;")
  conn.execute("CREATE VIEW bananas AS SELECT 2;")

  conn.execute("CREATE OR REPLACE VIEW apples AS SELECT * FROM bananas;")

  expect(views).to eq(%w[scenic.bananas scenic.apples])

  conn.execute("DROP SCHEMA IF EXISTS scenic CASCADE;")
end

Nov 20 '23 21:11 edwardloveall

At my jobby job we've been using this since 10/26 and have made several updates to views (though none that change the dependency structure) and have not seen any unnecessary reordering of views, so I'm inclined after ~2.5 months to call this a win.

@edwardloveall I wonder if #401 helps with your use case at all? Also if you want to write up a PR to add specs that exercise view ordering I'd appreciate it.

Jan 17 '24 18:01 calebhearth

It's good enough for me. Merge when ready.

When we had talked about this problem originally (years ago), we said we would make this a "breaking change" for 2.0 since it would likely change everyone's generated structure.sql once. Do you still feel that's appropriate?

Jan 17 '24 18:01 derekprior

Unclear - I think we'd re-order the schema.rb one more time whenever it's next generated (with a view change?) which has already been happening, then it should stop. That may not be a breaking change.

Jan 17 '24 18:01 calebhearth

@edwardloveall I wonder if #401 helps with your use case at all?

It does, but only in combination with the changes here.

Also if you want to write up a PR to add specs that exercise view ordering I'd appreciate it.

Done: https://github.com/scenic-views/scenic/pull/403

Jan 22 '24 00:01 edwardloveall

I'm seeing a failure on one of the specs here after @edwardloveall's PR adding sort order specs was merged. I'm fairly sure that the spec is correct and I'm not sure why the failure is happening.

Feb 23 '24 17:02 calebhearth

I'm seeing a failure on one of the specs here after @edwardloveall's PR adding sort order specs was merged. I'm fairly sure that the spec is correct and I'm not sure why the failure is happening.

Looking at the branch, it's similar to my previous comment: the SQL that generates the dependent views does not contain the schema, but Scenic's views do. tsorted_views then sorts the non-schema-qualified views correctly and ignores the schema-qualified views. Then the non-schema-qualified views get filtered out.

Here's one possible fix:

  def tsorted_views(views_names)
    views_hash = TSortableHash.new
  
    ::Scenic.database.execute(DEPENDENT_SQL).each do |relation|
-     source_v = relation["source_table"]
-     dependent = relation["dependent_view"]
+     source_v = [relation["source_schema"], relation["source_table"]].join(".")
+     dependent = [relation["dependent_schema"], relation["dependent_view"]].join(".")
      views_hash[dependent] ||= []
      views_hash[source_v] ||= []
      views_hash[dependent] << source_v
      views_names.delete(source_v)
      views_names.delete(dependent)
    end
  
    # after dependencies, there might be some views left
    # that don't have any dependencies
    views_names.sort.each { |v| views_hash[v] = [] }
    binding.irb
  
    views_hash.tsort
  end

Feb 23 '24 18:02 edwardloveall

I'm seeing a failure on one of the specs here after @edwardloveall's PR adding sort order specs was merged. I'm fairly sure that the spec is correct and I'm not sure why the failure is happening.

Looking at the branch, it's similar to my previous comment: the SQL that generates the dependent views does not contain the schema, but Scenic's views do. tsorted_views then sorts the non-schema-qualified views correctly and ignores the schema-qualified views. Then the non-schema-qualified views get filtered out.

Here's one possible fix:
  def tsorted_views(views_names)
    views_hash = TSortableHash.new
  
    ::Scenic.database.execute(DEPENDENT_SQL).each do |relation|
-     source_v = relation["source_table"]
-     dependent = relation["dependent_view"]
+     source_v = [relation["source_schema"], relation["source_table"]].join(".")
+     dependent = [relation["dependent_schema"], relation["dependent_view"]].join(".")
      views_hash[dependent] ||= []
      views_hash[source_v] ||= []
      views_hash[dependent] << source_v
      views_names.delete(source_v)
      views_names.delete(dependent)
    end
  
    # after dependencies, there might be some views left
    # that don't have any dependencies
    views_names.sort.each { |v| views_hash[v] = [] }
    binding.irb
  
    views_hash.tsort
  end

Guess my comment on original PR was lost, mentioned in here https://github.com/scenic-views/scenic/pull/337#pullrequestreview-1231488975, we have been running with those changes since then without issues in apps that use cross schema views.

Feb 27 '24 09:02 Unknown-Guy

Superseded by https://github.com/scenic-views/scenic/pull/416

May 02 '24 15:05 calebhearth

scenic scenic copied to clipboard

Add topological sorting for dumped views via TSort

scenic
scenic copied to clipboard