tapioca Suggestion: Implement `<Model>::Persisted` opaque type to solve column nilability confusion

The Problem

The tapioca compiler for ActiveRecord models makes most columns nilable, ignoring database constraints. This is correct on paper, because when you call Model.new none of those would matter, until you eventually try to persist the record in the database.

In practice though, this is a common pitfall and cause for confusion, especially for those just getting started with Sorbet.

This is not a new problem, it has in fact been discussed in the past both here and in the Sorbet Slack workspace:

https://github.com/Shopify/tapioca/issues/600
https://github.com/Shopify/tapioca/issues/1342
https://github.com/Shopify/tapioca/issues/1642

Existing Solutions

There are currently two possible paths to address this:

The most common one is to just deal with this in the code and use T.must whenever Sorbet complains about nilability and we know the record has been persisted, so the column cannot be nil.
The other, less known and currently undocumented solution is to define a StrongTypeGeneration module in the model to tell the compiler to generate the class following the database constraints (as explained in this comment). This will make the class work in most cases, but it won't allow you to call a non-nullable column on a newly instantiated model without throwing an error (e.g. Model.new.non_nilable_column)

The two solution seem mutually exclusive and present pros/cons to different cases, so developers will always need to weight in with additional T.must or T.unsafe to "correct" Sorbet misunderstandings. Ideally, we'd like to teach sorbet exactly what to expect.

A proposal for an alternative solution

I recently stumbled across the concept of "opaque types" thanks to an article from Jez.

After reading the article, I began wondering if it wouldn't be possible to change the compiler so that on top of the Model class, a Model::Persisted or PersistedModel class would be generated as well. The two classes would be identical and they'd only differ by the fact the the latter will have stricter constraints on columns, reflecting database constraints.

Then, methods signatures can be changed to return one type or the other. For example:

# app/models/user.rb
class User < ActiveRecord::Base
  # == Schema Information
  #
  # Table name: users
  #
  #  id                          :integer          not null, primary key
  #  email                       :string           default(""), not null
  #  encrypted_password          :string           default(""), not null
  #  created_at                  :datetime         not null
  #  updated_at                  :datetime         not null
end

# user.rbi
module CommonMethods
  sig { returns(PersistedUser) }
  def save; end

  sig { returns(PersistedUser) }
  def find_by; end

  sig { returns(PersistedUser) }
  def update; end
end

class User
  include CommonMethods

  sig { returns(T.nilable(Integer)) }
  def id; end

  sig { returns(T.nilable(String)) }
  def email; end

  ...
end

class PersistedUser # or User::Persisted ?
  include CommonMethods

  sig { returns(Integer) }
  def id; end

  sig { returns(String) }
  def email; end

  ...
end

cc @rafaelfranca @Morriar @KaanOzkan based on your past contributions to the topic 🙏

Sep 21 '23 09:09 iMacTia

Gentle nudge on this one, I'm not pretending to see a PR to get this fixed, I merely want to kick off the discussion to understand if this makes sense or not and if you'd like to introduce it in tapioca, then I'm happy to take a stab at it myself if necessary 😄

Oct 12 '23 13:10 iMacTia

The column can be nilable so we can't always use PersistedUser. In your example CommonMethods will always return PersistedUser. Also the common methods are defined under ActiveRecord RBIs and shared for all models.

Would something like this work? Having User and PersistedUser classes each with their own set of methods and developer could manually specify if an instance is persisted and use that type rest of the way.

# sorbet/rbi/dsl/persistent_user.rbi
class PersistedUser
  sig { returns(Integer) }
  def id; end

  sig { returns(String) }
  def email; end
end

# models/user.rb
class User < ApplicationRecord
  sig { returns(PersistedUser) }
  def save
    super
  end
end

Oct 16 '23 14:10 KaanOzkan

The column can be nilable so we can't always use PersistedUser. In your example CommonMethods will always return PersistedUser. Also the common methods are defined under ActiveRecord RBIs and shared for all models.

I might have chosen the wrong names and created some confusion, but if I check the tapioca-generated RBIs for my project's models, they each have these methods specified (except for save, maybe because it actually returns a boolean?). So apologies if my example is not precise, but the idea is for all those methods (find, first, last, etc...) to return a PersistedUser rather than User. Obviously, new would return User since we'd expect those columns to be nilable, but that would be one of the very few cases where we'd deal with a standard User, I believe?

Would something like this work? Having User and PersistedUser classes each with their own set of methods and developer could manually specify if an instance is persisted and use that type rest of the way.

I don't think this would work, because then the two classes would be completely different and not interchangeable. Let's not forget that PersistedUser would not exist at runtime, so whenever we use it in a signature it should be able to represent also User.

Oct 17 '23 13:10 iMacTia

Playing some more with this and I managed to get something to work in Sorbet's playground: see example

I had to abandon the idea of a Persisted::User in favour of Unpersisted::User: this is because when you deal with the union type, T.nilable is "contagious" so it makes sense to flip the proposal.

This effectively means we'd always deal with persisted records, except for when you call new or build, where you'd get an Unpersisted::<model> record (yes, I hate the name, I'm sure we'll find something better).

Oct 17 '23 14:10 iMacTia

This is a really good idea and would be a huge improvement to Tapioca.

How would you implement save? This method returns a boolean but would need to narrow the type to User. As far as I understand, type narrowing methods don't exist in sorbet. I opened this issue over a year ago that has yet to be addressed.

In any case, I think this would be a huge improvement to Tapioca's models, allowing 100% type safety without sacrificing the convenience of not having to assert everywhere.

Mar 02 '24 15:03 stathis-alexander

This is a really great idea! Would love to help however I can to try to make this more real, since it would definitely be a UX improvement. And I actually like the inversion to have Unpersisted::User because that's probably the less common flow, so most of the time, you wouldn't need to cast or worry about these alternate types.

Is there a path to land this sooner as we work it out if, instead of requiring the Unpersisted types to be returned by ActiveRecord APIs, we start by just including the new type definitions for folks and we can manually cast as needed for our use cases? That would still be an improvement over T.must all over the place. But have never contributed directly to Tapioca so open to whatever makes sense!

Apr 18 '24 17:04 alexgraffeocohen

@stathis-alexander @alexgraffeocohen sorry I didn't mean to drop the ball on this, and I'm still very much interested in making this work. I'm just struggling to find the time at the moment 😅.

If any of you would like to push this forward or make any progress, here is what a possible roadmap could look like:

Get a PoC working on the Sorbet playground <-- we're here, the one I shared in my last comment seems a good starting point
Test in a real Rails application by manually creating the RBIs based on the PoC. This can be a sample app, doesn't need to be enterprise/production.
Once we're confident with the RBIs, the next step is to write a tapioca compiler that will generate them automatically. NOTE: This might also require changes to the existing AR compilers to avoid conflicts on signatures.

How would you implement save?

I'm not sure I understand the need for narrowing the type. You don't need to pass a record as a parameter, so the signature would simply be sig { returns(T::Boolean) } 🤔

Is there a path to land this sooner as we work it out

I don't think there is. Sorbet requires these signatures to be statically evaluated, meaning if we want the same object to behave differently, we'll effectively need 2 separate objects.

Apr 18 '24 19:04 iMacTia

As a temporary stop-gap solution, we were considering adding a new method like id!:

class ApplicationRecord
  sig { returns(Integer) } # _not_ nilable
  def id! = id or raise SomeError
end

Creating a module that overrides the existing id would be even better!

Apr 18 '24 20:04 amomchilov

@iMacTia

I'm not sure I understand the need for narrowing the type. You don't need to pass a record as a parameter, so the signature would simply be sig { returns(T::Boolean) } 🤔

If I instantiate a new object:

my_object = MyObject.new(**parameters)
reveal_type(my_object) # should show unpersisted object type

it should by default have the Unpersisted type. Later, when I save that object, it needs to "narrow" into the persisted type.

my_object.save!
reveal_type(my_object) # should show persisted type

I don't believe sorbet has any support for this kind of thing at the moment.

Apr 18 '24 20:04 stathis-alexander

@stathis-alexander Ah no I don't think that's possible at all! That would mean changing the type of the variable simply by calling a method on it. However, sorbet does allow re-assigning a variable (with some limitations, see error 7001), so you should be able to do the following:

my_object = MyObject.new(**parameters)
reveal_type(my_object) # Unpersisted::MyObject

my_object = my_object.save!
reveal_type(my_object) # MyObject

Apr 19 '24 09:04 iMacTia

# create_table :comments, force: true do |t|
#   t.integer :post_id, null: false
# end
comment = Comment.find(1)
comment.post_id = nil
comment.valid?

This code is totally valid code and should still work and with what you are proposing it would not.

Worse yet, something like this would explode at runtime but pass type checking with what is being proposed here.

class Comment < ApplicationRecord
  before_validation :transform_post_id

  def transform_post_id
    self.post_id = self.post_id + 1
  end
end

comment = Comment.find(1)
comment.post_id = nil
comment.valid?

I don't think the type system should lie about the runtime behavior of the system.

Apr 24 '24 19:04 rafaelfranca

This code is totally valid code and should still work and with what you are proposing it would not.

Fair, but how common would that use-case be? And what if it could be solved with a T.cast(comment, Unpersisted::MyObject)?

Worse yet, something like this would explode at runtime but pass type checking with what is being proposed here.

I'm not sure I follow this one, the last three lines are the same as the example above which we agreed would not pass type-checking, so I guess there's a typo in the code unless I'm missing something obvious 🤔

Anyway, I completely understand the "the type system shouldn't lie" position and honestly it's hard to argue against it. My point I guess is: is it right to cause friction on development for the sake of correctness? From my standpoint, I'd favour a solution that might sacrifice some correctness if it helps improving the experience of the majority of cases.

For example (which might relate to your second snippet above), I'd prefer dealing with an immutable object retrieved from the DB but that it gives me some guarantees, rather than having to deal with constant nullability. To achieve that, we could make all setters forbidden (e.g. post_id= or assign_attributes) on persisted object, and if you do need to make changes then type casting or even re-instantiation might be needed.

Now that I said that out loud, I wonder if this might pair well with readonly! 😲

Apr 25 '24 08:04 iMacTia

Anyway, I completely understand the "the type system shouldn't lie" position

I think tapioca as a community has already taken the position that accuracy is more important than ergonomics, for better or worse.

I think the proper way to handle this dichotomy is to allow this as a configuration option. I think many would agree that the convenience here is worth the slightly less accurate typing, as evidenced by the number of questions, issues, etc that come up concerning the default nil-able types generated for active record models with non-nil validations (or non-null column constraints). Those of us who feel that way should be able to toggle tapioca into a more relaxed type system.

Ultimately, sorbet was built to enable higher developer productivity and improved developer experience.

Apr 25 '24 11:04 stathis-alexander

Making this configurable would definitely make everyone happy, but as an OSS maintainer myself, I know that's the easiest way to get things harder to maintain. So happy to leave this open a bit more to facilitate discussion, this is not a bug anyway (and to that end, happy to turn this into a Discussion if you prefer)

Apr 25 '24 14:04 iMacTia

Rather than making another configuration option for this in Tapioca, can't you disable the default DSL compiler and provide your own with the behavior that suits you?

Apr 25 '24 14:04 Morriar

@Morriar - Yes, that's of course an option, although... it requires a heavy lift and would need to be maintained in parallel with the default one.

Instead, this change allows monkey patching existing behavior. I've got something worked out already for the nil-able behavior for AR columns, re-using some helper method from the now deprecated sorbet-rails:

module Tapioca
  module Dsl
    module Helpers
      class ActiveRecordColumnTypeHelper
        include SorbetRailsBorrowedMethods

        # https://github.com/Shopify/tapioca/blob/main/lib/tapioca/dsl/helpers/active_record_column_type_helper.rb#L37
        def column_type_for(column_name)
          return ['T.untyped', 'T.untyped'] if do_not_generate_strong_types?(@constant)

          column = @constant.columns_hash[column_name]
          column_type = @constant.attribute_types[column_name]
          getter_type = type_for_activerecord_value(column_type)
          setter_type =
            case column_type
            when ActiveRecord::Enum::EnumType
              enum_setter_type(column_type)
            else
              getter_type
            end

          if column&.null && !attribute_has_unconditional_presence_validation?(column_name)
            getter_type = as_nilable_type(getter_type) unless not_nilable_serialized_column?(column_type)
            return [getter_type, as_nilable_type(setter_type)]
          end

          [getter_type, setter_type]
        end
      end
    end
  end
end

It's on my list of things to do to host a custom compilers repo that others can share (money-rails being an example).

Apr 25 '24 14:04 stathis-alexander

Can be closed by https://github.com/Shopify/tapioca/pull/1888 perhaps?

cc: @paracycle

Jul 05 '24 10:07 aericson

#1888 is a very much welcomed change and we've already started using it in our application! But to be fair, the idea here is slightly different: while #1888 now allows you to choose how your AR class is typed, this suggestion wants to actually have 2 different classes (statically) to properly deal with both cases.

The solution provided by #1888 asks you to choose between the two cases and apply that everywhere, even where it doesn't really holds true (e.g. if you go :persisted, you won't catch things like Model.new.non_nullable_column)

Jul 05 '24 10:07 iMacTia

UPDATE: the :persisted option have been declared a mistake and will likely be removed in a future release, so I'd like to resume the discussion around this issue.

Alternative Proposal

I've spent some additional time looking at it and I've come up with an updated proposal. The main issue that was highlighted to me was that Opaque types would not be a good fit here, and that you technically need both classes (persisted/unpersisted) to exist at runtime.

I was able to get around this using const_set, this way both type exists at runtime as well and are identical.

We then have 2 options:

# Option A: plain Model represents the UNPERSISTED state
class Model; end
const_set(:PersistedModel, Model)

## Option A RBI: we can just inherit from Model because PersistedModel is more "restrictive" than Model

class Model < ActiveRecord::Base
  sig { returns(PersistedModel) }
  def self.find; end

  sig { returns(T.nilable(Integer)) }
  def id; end
end

class PersistedModel < Model
  sig { returns(Integer) }
  def id; end
end


# Option B: plain Model represents the PERSISTED state
class Model; end
const_set(:UnpersistedModel, Model)

## Option B RBI: here we can't use inheritance, because otherwise we would be able to pass an unpersisted record to methods that expect a persisted one

class Model < ActiveRecord::Base
  sig { returns(UnpersistedModel) }
  def self.new; end

  sig { returns(Integer) }
  def id; end
end

class UnpersistedModel
  sig { returns(T.nilable(Integer)) }
  def id; end
end

Issues

Since your code base will likely deal with PERSISTED records in most places, option A requires you to update method signatures in a lot of places.
Since with option B we can't use inheritance, UnpersistedModel won't automatically define all methods in Model. This is really annoying but I'm not sure there's a quick workaround? Should the tapioca compiler redefine ALL methods? That seems painful because it would mean relying on the compiler every time we add a helper method

Dealing with helper (custom, non-columns) methods and diff in implementation

Regardless of the solution, how do we deal with custom methods defined in Model that return a value based on the column? For example:

# this doesn't make sense, but I need to do something with id that requires it to be non-nullable
def key
  "Model##{id * 2}"
end

In option A, Model is unpersisted, so the method implementation will need to deal with id being nil, that might mean return nil if id.nil?, so the signature for the method would be returns(T.nilable(String)). In PersistedModel this signature could be reinforced to be returns(String), but we'd need to do that by hand.

In option B, Model is persisted! That means we can have the better implementation and returns(String) signature straight in there, but UnpersistedModel won't even know this method exists unless the Tapioca compiler defines it. And if the compiler does define it, there's no guarantee the method will work! And the signature would need to be adjusted. This feels "safer" from a typing perspective and in a way severely limits your usage of UnpersistedModel, which may be good by design. But if you really need to expose this method, you'd have nowhere to define it.

Conclusion

Based on the updated proposal and the limited time I could spend on it, I'd be inclined to take a closer look at option B. The extra limitations are surely gonna cause some headaches, but it seems like this approach would respect type soundness (cc @rafaelfranca, this would keep it consistent with the direction of tapioca)

Oct 25 '24 08:10 iMacTia