rails icon indicating copy to clipboard operation
rails copied to clipboard

Allow :schema_format in database configuration to override ActiveRecord default

Open leboshi opened this issue 1 year ago • 4 comments

Summary

The schema dump format (Ruby/SQL) can currently only be set globally, either via setting ActiveRecord.schema_format or by setting ENV['SCHEMA_FORMAT']. Thanks to #43530 we can set the dump filename, but that has no bearing on the format Rails dumps or expects to load from. This PR allows you to set schema_format per-database in your configuration YAML. It will still be overridden by ENV['SCHEMA_FORMAT'] when set and will fall back to ActiveRecord.schema_format.

Fixes #45596 (and the second half of #43173, at last).

Changes

  • ActiveRecord::DatabaseConfigurations::HashConfig:
    • Deprecate passing a format to schema_dump
    • New schema_format method that checks
      1. ENV['SCHEMA_FORMAT']
      2. the database's configuration hash
      3. the default ActiveRecord.schema_format
  • Remove schema format from invocations of methods that otherwise have access to DB config hashes
    • Deprecate passing a format as a parameter to load_schema, schema_up_to_date?, reconstruct_from_schema, dump_schema, schema_dump_path, and load_schema_current in DatabaseTasks

Notes

  • Didn't specify a deprecation horizon in the deprecation messages.
  • I like that this DRYs up the ENV['SCHEMA_FORMAT'] override, which was in multiple places in databases.rake before, but it does seem smelly to me that the HashConfig is checking ENV vars; that felt a lot more natural in a Rakefile. However, there really isn't another place I saw it making sense, moving toward keeping the format in the DB config and removing it from being passed around separately. Open to feedback if anybody has thoughts on a cleaner place to put that override.

leboshi avatar Jul 14 '22 05:07 leboshi

Hmmmm, looking into the Railties test failures.

leboshi avatar Jul 14 '22 06:07 leboshi

Wow, ok, @eileencodes, you're awesome. Thank you for all the feedback! Here we are again 😅

@matthewd, I completely agree. The flip side is, yeah, it also feels very 🤔 not to explicitly deprecate it but to effectively deprecate it, still using it only internally to facilitate the ENV var but not objecting when used manually... I really wasn't sure what was less 🤯

Before I go diving back in to resolve all this... This may be a larger conversation, but zooming out and backing up, what do you think the ideal solution looks like? Thoughts from someone who loves Rails but isn't a Rails maintainer (😅):

  1. In terms of the Principle of Least Surprise, I think I'd expect the simplest solution to work, which is exactly what you suggested, @eileencodes: parsing out the format from the schema dump filename. (I think I was trying to avoid the complexity of string parsing, but...alas.)
  2. I also agree it can be just as surprising if you can independently specify the dump filename and format, as it is now when the filename isn't consulted at all regarding the format: in both cases, you can end up with schema.rb in SQL format or vice-versa. However...
  3. In terms of convention-over-configuration, it also seems un-Rails-y to require someone to specify an entire filename just to specify the dump format. You lose the elegance of the automatic filenames. If I pin my schema dump filename for my secondary database to secondary_schema.rb the way Rails would name it for me, and months later I decide to rename the database to follower or something, I'd definitely be wondering why in the world I seem to be migrating fine but there's no follower_schema.rb appearing!
  4. There's already a lot of complexity here: you have the actual AR default set in your config, you have an ENV var overriding it globally but only for a single invocation of whatever you're running, and we're discussing one or more options you can set in the database config YAML to override it for each database. And, if ENV["SCHEMA"] is also consulted to try to determine the format, we end up having to define orders of precedence and which cases we disallow two things to be set at once. Is there an opportunity to unify any of this?
  5. Specifically, using the ENV var to specify schema format was a surprising update for me, in between submitting the original PR and this one. It feels like a hotfix-y solution to this problem: it lets you rake db:schema:dump:primary and then SCHEMA_FORMAT=sql rake db:schema:dump:secondary, etc., but the ability to just rake db:migrate is lost to you. However,
  6. The point of @ghiculescu's PR (#44834) was actually that sometimes you need to be able to export both SQL and Ruby formats for the same database.

So with all that in mind, what if we change schema_dump to accept a hash, as:

ghiculescu:
  schema_dump: # no filename specified, so generates ghiculescu_schema.rb and ghiculescu_structure.sql
    format:
      - ruby
      - sql
secret:
  schema_dump: false # no dump
leboshi:
  schema_dump: # generates my_db.rb
    basename: my_db
    format: ruby
someone_else:
  schema_dump: # generates main.rb or main.sql, according to ActiveRecord.schema_format
    basename: main

And format can be either a single value or an array. That way

  • people who don't want to set a filename but only the format can do so (and still use Rails's default filenames),
  • people who need to pin their schema dump filenames down for sharding or other purposes can do so, AND
  • people who need none, one, or both schema formats can dump them easily!

I feel like that'd be the most generally applicable system for all use cases, right?

And then I'd love to deprecate both the single string value for schema_dump, and both ENV["SCHEMA"] and ENV["SCHEMA_FORMAT"]. That way, there's no nasty string parsing necessary, and there's no need for any hierarchal conflict detection or resolution.

🥵 Sorry, that was long. Sorry for turning the PR into an RFC. Let me know what you think! Happy to proceed making the changes you recommended above, but I do feel like it's at the point where it might be making a bigger mess to clean up later.

leboshi avatar Jul 26 '22 04:07 leboshi

How would you get the default setup (a single schema.rb file)? Would it be this?

  schema_dump:
    format: ruby

And how would all this interact with ActiveRecord.schema_format?

ghiculescu avatar Jul 26 '22 13:07 ghiculescu

The default would just be omitted from database.yml. In true Rails fashion, I think the majority of cases wouldn't need to change anything about the schema dump. It's only if you need to lock down your filename or format that you'd include this at all.

That's a good question about ActiveRecord.schema_format. Back in single db days, that was the setting, right?. Then when multiple db support was added, it stayed the only setting, which wasn't flexible enough. With the ENV and schema_dump PRs, though, it sort of silently shifted semantics to be a global default. My inclination is not to touch that. If you don't specify anything in database.yml, it uses the default filename and uses ActiveRecord.schema_format.

But maybe at that point it's worth aliasing it as ActiveRecord.default_schema_format and deprecating schema_format?

leboshi avatar Jul 26 '22 15:07 leboshi