rails icon indicating copy to clipboard operation
rails copied to clipboard

Update `ActiveRecord::Batches#in_batches` to make use of `ActiveRecord::Base.implicit_order_column` when present.

Open HallDJack opened this issue 2 years ago • 14 comments

Summary

When using ActiveRecord::Batches#in_batches ActiveRecord sorts records by primary key. This causes unpredictable behavior and errors when the primary key is not an auto-incrementing data type, like a UUID.

This change follows up on changes made in some earlier PRs (linked below) that made changes to the +first+ and +last+ finder calls. It updates ActiveRecord::Batches#in_batches so that it sorts by the implicit_order_column when that attribute has been configured. This allows for the batch processing of records that do not have an auto-incrementing primary key. Currently batch processing of a ActiveRecord model that has a UUID primary key does not consistently succeed.

Related PRs:

  • https://github.com/rails/rails/pull/34480
  • https://github.com/rails/rails/pull/37626

Other Information

The only slight difference in batch processing behavior comes when using batches with an implicit_order_column that is non-unique. In that case the batch size will represent the number of unique values to be processed in the batch not the number of records. For Example:

If the models's implicit_order_column has values of:

[:a, :b, :c, :c, :d, :e, :f]

And the user calls Model.in_batches(of: 3) the first batch will contain four records, [:a, :b, :c, :c], and the second batch will contain the three remaining records, [:d, :e, :f]. No records will be skipped, barring any race conditions, but the number of records processed may vary between batches.

While working on this patch I attempted to include subsorting by the primary key to remove this difference in behavior, but could not come up with something that worked and didn't change the interface to the #in_batches method. I think changes could be made to the ActiveRecord PredicateBuilder to allow for subsorting, but that is currently outside my comfort zone so I opted to not attempt those changes at this time. I feel that allowing batch processing even with this caveat is better than the current behavior.

This is my first time submitting a PR to Rails, so please let me know if I missed any steps, need to make additional changes, or add additional testing.

HallDJack avatar Jul 29 '21 18:07 HallDJack

I am currently looking at the test failures. It seems to be a difference in the fixtures present when the test is run. I am having trouble replicating the failures in my rails-dev-box VM, but will keep trying.

HallDJack avatar Jul 29 '21 20:07 HallDJack

I am currently looking at the test failures. It seems to be a difference in the fixtures present when the test is run. I am having trouble replicating the failures in my rails-dev-box VM, but will keep trying.

The issue is with differences in how mysql2 and sqlite3 sort alphanumeric values. 🤯

mysql2:
[[1, "Welcome to the weblog"], [5, "sti me"], [4, "sti comments"], [2, "So I was thinking"], [11, "other post by mary"], [10, "other post by bob"], [9, "misc post by mary"], [8, "misc post by bob"], [3, "I don't have any comments"], [6, "habtm sti test"], [7, "eager loading with OR'd conditions"]]

sqlite3: 
[[5, "sti me"], [4, "sti comments"], [11, "other post by mary"], [10, "other post by bob"], [9, "misc post by mary"], [8, "misc post by bob"], [6, "habtm sti test"], [7, "eager loading with OR'd conditions"], [1, "Welcome to the weblog"], [2, "So I was thinking"], [3, "I don't have any comments"]]

I need to update the test to handle or avoid this quirk.

HallDJack avatar Jul 29 '21 23:07 HallDJack

The conflicts are just in the change log so I don't want to spend time updating them until there is some movement one way or another on this PR since I'd just be chasing changes.

HallDJack avatar Sep 03 '21 20:09 HallDJack

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rails-bot[bot] avatar Dec 02 '21 20:12 rails-bot[bot]

@HallDJack thank you for putting this together. Was there a reason this work we abandoned?

why-el avatar Jan 10 '22 04:01 why-el

@why-el, I stopped updating this PR and addressing merge conflicts because I never received any feedback from anyone so it didn't seem worth my time to keep addressing merge conflicts. Then a bot closed the PR. I took the lack of activity from others on the PR as an indication that the changes weren't desired by the wider community.

I am not sure what additional work needs to occur for this PR to move forward. My team is still using a work around for this issue in our project so it would be great to know what I need to do to get this PR moving forward again.

HallDJack avatar Jan 10 '22 04:01 HallDJack

Ok, thanks for letting me know. I don't see a reason why it won't proceed either, at least to a meaningful discussion. Perhaps re-opening it or sending it to Rails Core mailing list might move the needle again?

why-el avatar Jan 10 '22 20:01 why-el

I'll give it a go later tonight. My last post to the mailing list didn't get any traction either. I guess people are just busy and this isn't a large issue. https://discuss.rubyonrails.org/t/requesting-reviewer-for-ar-pr-42908/78595/2

HallDJack avatar Jan 10 '22 22:01 HallDJack

@why-el now that I load this PR up on the browser I don't seem to have the ability to reopen the PR. Probably a measure to keep the PR noise down for the maintainers. So I guess this PR dies here because I don't want to be 'that guy' that keeps submitting the same PR that doesn't get any traction.

Hopefully, if people stumble across this PR in the future they will leave a message like you did. With enough interaction someone would probably reopen it for me or ask me to submit it again. If you want to message the mailing list about this PR please be my guest. Maybe hearing about it from someone who isn't the author would help.

HallDJack avatar Jan 12 '22 22:01 HallDJack

Yeah, oh well, you did what you could. I'll keep an eye on various discussions and point people back here if anyone is facing the same issue, like you said. Further nudges in the mailing list unlikely to be fruitful.

why-el avatar Jan 13 '22 00:01 why-el

I'm interested in this PR moving forward.

Especially now that Rails supports sharding databases horizontally and it becomes critical to not accidentally "cross the streams" (ghost busters reference) and select a record from the wrong database. Using UUIDs prevents that from happening.

paulyoder avatar Aug 24 '22 13:08 paulyoder

I'll reopen since it was closed for going stale

skipkayhil avatar Aug 24 '22 22:08 skipkayhil

@HallDJack let me know if you would like help resolving the merge conflicts (and thanks for doing the initial work!)

paulyoder avatar Aug 25 '22 13:08 paulyoder

@paulyoder, sorry work has been very busy for me recently so I haven't been able to get back to this PR. It would definitely be helpful to have some help resolving the merge conflicts. I haven't touched my Rails dev VM since I submitted this PR so it is probably a bit out of date.

Having this functionality make it into Rails is definitely something that would be helpful for my app and would let me remove some bodges.

What do you need from me?

HallDJack avatar Sep 13 '22 21:09 HallDJack