Suggestion: Implement `<Model>::Persisted` opaque type to solve column nilability confusion
The Problem
The tapioca compiler for ActiveRecord models makes most columns nilable, ignoring database constraints.
This is correct on paper, because when you call Model.new none of those would matter, until you eventually try to persist the record in the database.
In practice though, this is a common pitfall and cause for confusion, especially for those just getting started with Sorbet.
This is not a new problem, it has in fact been discussed in the past both here and in the Sorbet Slack workspace:
- https://github.com/Shopify/tapioca/issues/600
- https://github.com/Shopify/tapioca/issues/1342
- https://github.com/Shopify/tapioca/issues/1642
Existing Solutions
There are currently two possible paths to address this:
- The most common one is to just deal with this in the code and use
T.mustwhenever Sorbet complains about nilability and we know the record has been persisted, so the column cannot benil. - The other, less known and currently undocumented solution is to define a
StrongTypeGenerationmodule in the model to tell the compiler to generate the class following the database constraints (as explained in this comment). This will make the class work in most cases, but it won't allow you to call a non-nullable column on a newly instantiated model without throwing an error (e.g.Model.new.non_nilable_column)
The two solution seem mutually exclusive and present pros/cons to different cases, so developers will always need to weight in with additional T.must or T.unsafe to "correct" Sorbet misunderstandings.
Ideally, we'd like to teach sorbet exactly what to expect.
A proposal for an alternative solution
I recently stumbled across the concept of "opaque types" thanks to an article from Jez.
After reading the article, I began wondering if it wouldn't be possible to change the compiler so that on top of the Model class, a Model::Persisted or PersistedModel class would be generated as well.
The two classes would be identical and they'd only differ by the fact the the latter will have stricter constraints on columns, reflecting database constraints.
Then, methods signatures can be changed to return one type or the other. For example:
# app/models/user.rb
class User < ActiveRecord::Base
# == Schema Information
#
# Table name: users
#
# id :integer not null, primary key
# email :string default(""), not null
# encrypted_password :string default(""), not null
# created_at :datetime not null
# updated_at :datetime not null
end
# user.rbi
module CommonMethods
sig { returns(PersistedUser) }
def save; end
sig { returns(PersistedUser) }
def find_by; end
sig { returns(PersistedUser) }
def update; end
end
class User
include CommonMethods
sig { returns(T.nilable(Integer)) }
def id; end
sig { returns(T.nilable(String)) }
def email; end
...
end
class PersistedUser # or User::Persisted ?
include CommonMethods
sig { returns(Integer) }
def id; end
sig { returns(String) }
def email; end
...
end
cc @rafaelfranca @Morriar @KaanOzkan based on your past contributions to the topic 🙏
Gentle nudge on this one, I'm not pretending to see a PR to get this fixed, I merely want to kick off the discussion to understand if this makes sense or not and if you'd like to introduce it in tapioca, then I'm happy to take a stab at it myself if necessary 😄
The column can be nilable so we can't always use PersistedUser. In your example CommonMethods will always return PersistedUser. Also the common methods are defined under ActiveRecord RBIs and shared for all models.
Would something like this work? Having User and PersistedUser classes each with their own set of methods and developer could manually specify if an instance is persisted and use that type rest of the way.
# sorbet/rbi/dsl/persistent_user.rbi
class PersistedUser
sig { returns(Integer) }
def id; end
sig { returns(String) }
def email; end
end
# models/user.rb
class User < ApplicationRecord
sig { returns(PersistedUser) }
def save
super
end
end
The column can be nilable so we can't always use PersistedUser. In your example CommonMethods will always return PersistedUser. Also the common methods are defined under ActiveRecord RBIs and shared for all models.
I might have chosen the wrong names and created some confusion, but if I check the tapioca-generated RBIs for my project's models, they each have these methods specified (except for save, maybe because it actually returns a boolean?).
So apologies if my example is not precise, but the idea is for all those methods (find, first, last, etc...) to return a PersistedUser rather than User. Obviously, new would return User since we'd expect those columns to be nilable, but that would be one of the very few cases where we'd deal with a standard User, I believe?
Would something like this work? Having User and PersistedUser classes each with their own set of methods and developer could manually specify if an instance is persisted and use that type rest of the way.
I don't think this would work, because then the two classes would be completely different and not interchangeable. Let's not forget that PersistedUser would not exist at runtime, so whenever we use it in a signature it should be able to represent also User.
Playing some more with this and I managed to get something to work in Sorbet's playground: see example
I had to abandon the idea of a Persisted::User in favour of Unpersisted::User: this is because when you deal with the union type, T.nilable is "contagious" so it makes sense to flip the proposal.
This effectively means we'd always deal with persisted records, except for when you call new or build, where you'd get an Unpersisted::<model> record (yes, I hate the name, I'm sure we'll find something better).
This is a really good idea and would be a huge improvement to Tapioca.
How would you implement save? This method returns a boolean but would need to narrow the type to User. As far as I understand, type narrowing methods don't exist in sorbet. I opened this issue over a year ago that has yet to be addressed.
In any case, I think this would be a huge improvement to Tapioca's models, allowing 100% type safety without sacrificing the convenience of not having to assert everywhere.
This is a really great idea! Would love to help however I can to try to make this more real, since it would definitely be a UX improvement. And I actually like the inversion to have Unpersisted::User because that's probably the less common flow, so most of the time, you wouldn't need to cast or worry about these alternate types.
Is there a path to land this sooner as we work it out if, instead of requiring the Unpersisted types to be returned by ActiveRecord APIs, we start by just including the new type definitions for folks and we can manually cast as needed for our use cases? That would still be an improvement over T.must all over the place. But have never contributed directly to Tapioca so open to whatever makes sense!
@stathis-alexander @alexgraffeocohen sorry I didn't mean to drop the ball on this, and I'm still very much interested in making this work. I'm just struggling to find the time at the moment 😅.
If any of you would like to push this forward or make any progress, here is what a possible roadmap could look like:
- Get a PoC working on the Sorbet playground <-- we're here, the one I shared in my last comment seems a good starting point
- Test in a real Rails application by manually creating the RBIs based on the PoC. This can be a sample app, doesn't need to be enterprise/production.
- Once we're confident with the RBIs, the next step is to write a tapioca compiler that will generate them automatically. NOTE: This might also require changes to the existing AR compilers to avoid conflicts on signatures.
How would you implement save?
I'm not sure I understand the need for narrowing the type. You don't need to pass a record as a parameter, so the signature would simply be sig { returns(T::Boolean) } 🤔
Is there a path to land this sooner as we work it out
I don't think there is. Sorbet requires these signatures to be statically evaluated, meaning if we want the same object to behave differently, we'll effectively need 2 separate objects.
As a temporary stop-gap solution, we were considering adding a new method like id!:
class ApplicationRecord
sig { returns(Integer) } # _not_ nilable
def id! = id or raise SomeError
end
Creating a module that overrides the existing id would be even better!
@iMacTia
I'm not sure I understand the need for narrowing the type. You don't need to pass a record as a parameter, so the signature would simply be sig { returns(T::Boolean) } 🤔
If I instantiate a new object:
my_object = MyObject.new(**parameters)
reveal_type(my_object) # should show unpersisted object type
it should by default have the Unpersisted type. Later, when I save that object, it needs to "narrow" into the persisted type.
my_object.save!
reveal_type(my_object) # should show persisted type
I don't believe sorbet has any support for this kind of thing at the moment.
@stathis-alexander Ah no I don't think that's possible at all! That would mean changing the type of the variable simply by calling a method on it. However, sorbet does allow re-assigning a variable (with some limitations, see error 7001), so you should be able to do the following:
my_object = MyObject.new(**parameters)
reveal_type(my_object) # Unpersisted::MyObject
my_object = my_object.save!
reveal_type(my_object) # MyObject
# create_table :comments, force: true do |t|
# t.integer :post_id, null: false
# end
comment = Comment.find(1)
comment.post_id = nil
comment.valid?
This code is totally valid code and should still work and with what you are proposing it would not.
Worse yet, something like this would explode at runtime but pass type checking with what is being proposed here.
class Comment < ApplicationRecord
before_validation :transform_post_id
def transform_post_id
self.post_id = self.post_id + 1
end
end
comment = Comment.find(1)
comment.post_id = nil
comment.valid?
I don't think the type system should lie about the runtime behavior of the system.
This code is totally valid code and should still work and with what you are proposing it would not.
Fair, but how common would that use-case be? And what if it could be solved with a T.cast(comment, Unpersisted::MyObject)?
Worse yet, something like this would explode at runtime but pass type checking with what is being proposed here.
I'm not sure I follow this one, the last three lines are the same as the example above which we agreed would not pass type-checking, so I guess there's a typo in the code unless I'm missing something obvious 🤔
Anyway, I completely understand the "the type system shouldn't lie" position and honestly it's hard to argue against it. My point I guess is: is it right to cause friction on development for the sake of correctness? From my standpoint, I'd favour a solution that might sacrifice some correctness if it helps improving the experience of the majority of cases.
For example (which might relate to your second snippet above), I'd prefer dealing with an immutable object retrieved from the DB but that it gives me some guarantees, rather than having to deal with constant nullability.
To achieve that, we could make all setters forbidden (e.g. post_id= or assign_attributes) on persisted object, and if you do need to make changes then type casting or even re-instantiation might be needed.
Now that I said that out loud, I wonder if this might pair well with readonly! 😲
Anyway, I completely understand the "the type system shouldn't lie" position
I think tapioca as a community has already taken the position that accuracy is more important than ergonomics, for better or worse.
I think the proper way to handle this dichotomy is to allow this as a configuration option. I think many would agree that the convenience here is worth the slightly less accurate typing, as evidenced by the number of questions, issues, etc that come up concerning the default nil-able types generated for active record models with non-nil validations (or non-null column constraints). Those of us who feel that way should be able to toggle tapioca into a more relaxed type system.
Ultimately, sorbet was built to enable higher developer productivity and improved developer experience.
Making this configurable would definitely make everyone happy, but as an OSS maintainer myself, I know that's the easiest way to get things harder to maintain. So happy to leave this open a bit more to facilitate discussion, this is not a bug anyway (and to that end, happy to turn this into a Discussion if you prefer)
Rather than making another configuration option for this in Tapioca, can't you disable the default DSL compiler and provide your own with the behavior that suits you?
@Morriar - Yes, that's of course an option, although... it requires a heavy lift and would need to be maintained in parallel with the default one.
Instead, this change allows monkey patching existing behavior. I've got something worked out already for the nil-able behavior for AR columns, re-using some helper method from the now deprecated sorbet-rails:
module Tapioca
module Dsl
module Helpers
class ActiveRecordColumnTypeHelper
include SorbetRailsBorrowedMethods
# https://github.com/Shopify/tapioca/blob/main/lib/tapioca/dsl/helpers/active_record_column_type_helper.rb#L37
def column_type_for(column_name)
return ['T.untyped', 'T.untyped'] if do_not_generate_strong_types?(@constant)
column = @constant.columns_hash[column_name]
column_type = @constant.attribute_types[column_name]
getter_type = type_for_activerecord_value(column_type)
setter_type =
case column_type
when ActiveRecord::Enum::EnumType
enum_setter_type(column_type)
else
getter_type
end
if column&.null && !attribute_has_unconditional_presence_validation?(column_name)
getter_type = as_nilable_type(getter_type) unless not_nilable_serialized_column?(column_type)
return [getter_type, as_nilable_type(setter_type)]
end
[getter_type, setter_type]
end
end
end
end
end
It's on my list of things to do to host a custom compilers repo that others can share (money-rails being an example).
Can be closed by https://github.com/Shopify/tapioca/pull/1888 perhaps?
cc: @paracycle
#1888 is a very much welcomed change and we've already started using it in our application! But to be fair, the idea here is slightly different: while #1888 now allows you to choose how your AR class is typed, this suggestion wants to actually have 2 different classes (statically) to properly deal with both cases.
The solution provided by #1888 asks you to choose between the two cases and apply that everywhere, even where it doesn't really holds true (e.g. if you go :persisted, you won't catch things like Model.new.non_nullable_column)
UPDATE: the :persisted option have been declared a mistake and will likely be removed in a future release, so I'd like to resume the discussion around this issue.
Alternative Proposal
I've spent some additional time looking at it and I've come up with an updated proposal. The main issue that was highlighted to me was that Opaque types would not be a good fit here, and that you technically need both classes (persisted/unpersisted) to exist at runtime.
I was able to get around this using const_set, this way both type exists at runtime as well and are identical.
We then have 2 options:
# Option A: plain Model represents the UNPERSISTED state
class Model; end
const_set(:PersistedModel, Model)
## Option A RBI: we can just inherit from Model because PersistedModel is more "restrictive" than Model
class Model < ActiveRecord::Base
sig { returns(PersistedModel) }
def self.find; end
sig { returns(T.nilable(Integer)) }
def id; end
end
class PersistedModel < Model
sig { returns(Integer) }
def id; end
end
# Option B: plain Model represents the PERSISTED state
class Model; end
const_set(:UnpersistedModel, Model)
## Option B RBI: here we can't use inheritance, because otherwise we would be able to pass an unpersisted record to methods that expect a persisted one
class Model < ActiveRecord::Base
sig { returns(UnpersistedModel) }
def self.new; end
sig { returns(Integer) }
def id; end
end
class UnpersistedModel
sig { returns(T.nilable(Integer)) }
def id; end
end
Issues
- Since your code base will likely deal with PERSISTED records in most places, option A requires you to update method signatures in a lot of places.
- Since with option B we can't use inheritance,
UnpersistedModelwon't automatically define all methods inModel. This is really annoying but I'm not sure there's a quick workaround? Should the tapioca compiler redefine ALL methods? That seems painful because it would mean relying on the compiler every time we add a helper method
Dealing with helper (custom, non-columns) methods and diff in implementation
Regardless of the solution, how do we deal with custom methods defined in Model that return a value based on the column? For example:
# this doesn't make sense, but I need to do something with id that requires it to be non-nullable
def key
"Model##{id * 2}"
end
In option A, Model is unpersisted, so the method implementation will need to deal with id being nil, that might mean return nil if id.nil?, so the signature for the method would be returns(T.nilable(String)).
In PersistedModel this signature could be reinforced to be returns(String), but we'd need to do that by hand.
In option B, Model is persisted! That means we can have the better implementation and returns(String) signature straight in there, but UnpersistedModel won't even know this method exists unless the Tapioca compiler defines it. And if the compiler does define it, there's no guarantee the method will work! And the signature would need to be adjusted.
This feels "safer" from a typing perspective and in a way severely limits your usage of UnpersistedModel, which may be good by design. But if you really need to expose this method, you'd have nowhere to define it.
Conclusion
Based on the updated proposal and the limited time I could spend on it, I'd be inclined to take a closer look at option B.
The extra limitations are surely gonna cause some headaches, but it seems like this approach would respect type soundness (cc @rafaelfranca, this would keep it consistent with the direction of tapioca)