mastodon icon indicating copy to clipboard operation
mastodon copied to clipboard

Rails 7.0 update

Open mjankowski opened this issue 2 years ago • 6 comments

Re-do of https://github.com/mastodon/mastodon/pull/24241

Background on merge/revert here: https://github.com/mastodon/mastodon/pull/24241

This contains the same changes as the previous PR, rebased against main after the merge/revert commits and a few other recent ones.

mjankowski avatar Jul 02 '23 12:07 mjankowski

This pull request has merge conflicts that must be resolved before it can be merged.

github-actions[bot] avatar Jul 02 '23 18:07 github-actions[bot]

This pull request has resolved merge conflicts and is ready for review.

github-actions[bot] avatar Jul 02 '23 18:07 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged.

github-actions[bot] avatar Jul 12 '23 10:07 github-actions[bot]

This pull request has resolved merge conflicts and is ready for review.

github-actions[bot] avatar Jul 12 '23 12:07 github-actions[bot]

Also, I'm a bit worried about https://guides.rubyonrails.org/upgrading_ruby_on_rails.html#key-generator-digest-class-changing-to-use-sha256 as I don't see it being addressed in this PR.

Will review and report back.

In either case, I will very shortly proceed to test this PR in a real environment!

Cool.

mjankowski avatar Jul 12 '23 14:07 mjankowski

Also, I'm a bit worried about https://guides.rubyonrails.org/upgrading_ruby_on_rails.html#key-generator-digest-class-changing-to-use-sha256 as I don't see it being addressed in this PR.

It indeed logged me out when testing it in a production environment.

ClearlyClaire avatar Jul 12 '23 15:07 ClearlyClaire

It indeed logged me out when testing it in a production environment.

Did some reading on this. It seems conceptually similar to the cache format thing both in that a) the damage from mishandling it will be annoyance (people needing to sign in again) but not data/security issues; b) we can set some interim config options to leave in place for a release or two, then finish the migration in later release.

Anyway, there are two changes:

  • The cookie serialization format changing from a ruby Marshall to JSON.
  • The cookie digest changing from SHA1 to SHA256

For the format change, we can set a config option to :marshall (preserve old format) or :hybrid (Start to migrate sessions) for now, and then disable that in some future release where most regular users will have been migrated already, and anyone who hasn't visited in a long time won't be surprised to be signed out.

For the digest change, we can do a similar thing with the rotator where we leave it in place for a release cycle or two so that everyday users will be migrated without getting signed out, and then we can remove it.

If you agree with that general approach here let me know and I'll update the PR.

Notes:

  • This would only impact browser-based cookie/session auth, API token access would not get disrupted
  • I believe the cookies.signed['_session_id'] managed by devise and used by the app is the most critical thing to rotate correctly -- any others?

mjankowski avatar Jul 12 '23 18:07 mjankowski

Using :hybrid and a rotator for a few releases seem like the best course of action. There's cookies.signed['_session_id'] but also just the encrypted-cookie-backed session store managed by Rails.

ClearlyClaire avatar Jul 12 '23 20:07 ClearlyClaire

Serializer -- disregard previous remarks. We actually already opt-in to this behavior from a previous rails upgrade -- https://github.com/mastodon/mastodon/blob/main/config/initializers/cookies_serializer.rb -- I think at this point we could remove that file because JSON is a default and no longer an opt-in.

Rotator -- I pushed a commit which adds the sample rotator from the upgrade post. In my local testing, I was able to confirm a minimal sign-in preservation fail w/out the rotator and succeed with the rotator. (I signed in running main, left browser open, stopped server, switched to rails 7 branch, started server, refreshed page -- remained signed in with rotator and was signed out without rotator.)

mjankowski avatar Jul 13 '23 00:07 mjankowski

Thanks for your work on this! The key rotator seems to work, I've merged the PR on my single-user server and did not notice any issue yet. I believe this is ready for inclusion.

ClearlyClaire avatar Jul 13 '23 07:07 ClearlyClaire

A behavior change I just noticed is IIRC, assets:precompile used to install yarn dependencies before running webpack, but that does not seem to happen anymore.

This is because railties dropped the yarn:install dependency from the assets:precompile task. This is not necessarily a bad thing, but it is a noteworthy change.

ClearlyClaire avatar Jul 13 '23 11:07 ClearlyClaire

Deployed to vmst.io, my own session was briefly logged out (like 5 seconds) as it rolled between k8s pods, but it's working fine now.

vmstan avatar Jul 13 '23 14:07 vmstan

A behavior change I just noticed is IIRC, assets:precompile used to install yarn dependencies before running webpack, but that does not seem to happen anymore.

This is because railties dropped the yarn:install dependency from the assets:precompile task. This is not necessarily a bad thing, but it is a noteworthy change.

Probably same issue - https://github.com/rails/rails/issues/46148

mjankowski avatar Jul 13 '23 14:07 mjankowski

Deployed to vmst.io, my own session was briefly logged out (like 5 seconds) as it rolled between k8s pods, but it's working fine now.

Just to confirm -- the session was preserved and you stayed signed in without needing to re-auth ... but you just had a normal connectivity drop during the k8s rollover?

mjankowski avatar Jul 13 '23 14:07 mjankowski

The session was preserved. I was watching Sidekiq queues show up and then the whole page went blank, which doesn't normally happen. I went to the main page and I was logged out but then when I refreshed it was fine.

However I'm seeing issues with 2fa codes no longer working. Hardware token is fine but code generators are not working.

vmstan avatar Jul 14 '23 02:07 vmstan

Also confirming 2FA is not working. I had/have to go into the DB for all users with 2FA and remove OTP manually

WhiskeyOmega avatar Jul 14 '23 02:07 WhiskeyOmega

The session was preserved. I was watching Sidekiq queues show up and then the whole page went blank, which doesn't normally happen. I went to the main page and I was logged out but then when I refreshed it was fine.

Thanks for confirming.

However I'm seeing issues with 2fa codes no longer working. Hardware token is fine but code generators are not working.

These are cases where the OTP was setup and working prior to update, and now post update even if initial password auth works fine, the otp verification step fails?

If so, is the request blowing up with error, or just normal failure as though they had incorrect code?

mjankowski avatar Jul 14 '23 03:07 mjankowski

If so, is the request blowing up with error, or just normal failure as though they had incorrect code?

Initial password works Incorrect code is then thrown up.

WhiskeyOmega avatar Jul 14 '23 03:07 WhiskeyOmega

That is weird, I have no issue logging in using TOTP. I'm unsure what could be different between our setups. The gem used for generating/checking TOTP codes also does not have any Rails-related dependency. So my only guess is that something is amiss with attr_encrypted?

ClearlyClaire avatar Jul 14 '23 07:07 ClearlyClaire

@vmstan @WhiskeyOmega if you haven't cleared TOTP parameters for your account yet, can you try a few things?

From Rails 7, in a Rails console (RAILS_ENV=production bundle exec rails c) check the output of Account.find_local('your_username').user.otp_secret. Is there any output? If there is any output, DO NOT SHARE THIS VALUE

If there is some output, can you run the same thing from a pre-Rails 7 copy of the code (do not start mastodon-web or mastodon-sidekiq with it though) and compare they are the same value?

If the first value was empty or the two did not match, can you share the output of Rails.configuration.x.otp_secret&.size in a Rails console?

ClearlyClaire avatar Jul 14 '23 10:07 ClearlyClaire

Ok so my bad. I was using the TOTP for a different app when I tested.

Sorry 🫢

vmstan avatar Jul 14 '23 12:07 vmstan

One thing I missed with the cookie situation is that when doing zero-downtime migrations we may mix Rails 7 and Rails 6 workers, in which case it will lead to temporary hiccups with unreadable cookies… is there something we can do to make the transition smoother?

ClearlyClaire avatar Jul 14 '23 13:07 ClearlyClaire

One thing I missed with the cookie situation is that when doing zero-downtime migrations we may mix Rails 7 and Rails 6 workers, in which case it will lead to temporary hiccups with unreadable cookies… is there something we can do to make the transition smoother?

Hmm, right ... I assume this is around the scenario where you are doing a rolling deploy across multiple app servers and someone:

  • Makes a request with their old-format cookies
  • Req gets routed to an already restarted app server running 7.0 code, and cookie format gets rotated
  • Next req in gets routed to an app server NOT yet restarted in the rolling restart, and thus the upgraded format can't be read and they are signed out

There's probably a solution there at the IP/http/routing layer in a datacenter (ie, change config to track which connections have been routed to an updated app server, and keep them in the pool of updated servers for the duration of the rollout), but that's going to be specific to each setup and might reduce some of the benefit of the rolling deploy.

mjankowski avatar Jul 14 '23 16:07 mjankowski

I'm wondering if we couldn't flip the thing around: write Rails 6.1 compatible cookies and support Rails 7 cookies, then in a later version switch that around to support both cookies but emit Rails 7 compatible cookies, before finally dropping Rails 6 cookies in a later release?

ClearlyClaire avatar Jul 14 '23 16:07 ClearlyClaire

I'm wondering if we couldn't flip the thing around: write Rails 6.1 compatible cookies and support Rails 7 cookies, then in a later version switch that around to support both cookies but emit Rails 7 compatible cookies, before finally dropping Rails 7 cookies in a later release?

This seems to be the sensible solution

renchap avatar Jul 15 '23 08:07 renchap

I'm wondering if we couldn't flip the thing around: write Rails 6.1 compatible cookies and support Rails 7 cookies, then in a later version switch that around to support both cookies but emit Rails 7 compatible cookies, before finally dropping Rails 7 cookies in a later release?

I think there might be a typo in the last part here? ("dropping rails 7 cookies" should be the SHA1 Rails 6 cookies?). That aside, I think what you are suggesting is something like:

  • Drop the cookie rotation code, and don't plan to rotate the existing cookies
  • Change the main/rails-7 branch to issue session cookies using a new name, so they dont collide
  • Change the session auth logic to something like "first look for cookie with new name/SHA256, if present attempt to auth with it; if not present, look for the old-name/SHA1 cookie, if present, attempt to auth with it". I'm assuming that "attempt to auth with it" is going to be possible/testable but havent confirmed this. Presumably its a matter of passing in an a digest strategy instead of relying on the default.
  • Let this code run during the initial upgrade rolling deploy

This would allow the workers still running the pre-7 code to keep working during the rolling deploy, because even if a browser had hit an upgraded worker in a prior request, the SHA1/old-name cookie would still be there unmodified. The updated workers would try to read the new name first and use it if present, but fall back on the old ones (but not change them).

In a future release, after some value of "enough time", we could drop the support for reading the old format on the assumption that frequent enough users would have gotten the new format in the new name by then.

I have not researched how straightforward either of:

  • The fallback/backup code would be? I'm assuming Rails makes it possible to drop in a different digester, but haven't checked
  • Whatever configuration is needed between devise and other auth code to configure the new cookie name, or whether having the two at once approach would be complex or not there.

mjankowski avatar Jul 15 '23 12:07 mjankowski

I think there might be a typo in the last part here? ("dropping rails 7 cookies" should be the SHA1 Rails 6 cookies?).

Definitely a typo, I meant “dropping Rails 6 cookies”, sorry!

That aside, I think what you are suggesting is something like: […]

No, sorry, that's not what I was suggesting! I meant, if possible, continue to use SHA1 in the next release, but also support reading from SHA256 (which wouldn't be used in the next release). Then, in a later release, switch to SHA256, with a rotator supporting SHA1 cookies (switch to the current code). There does not need to be any significant delay between those two releases, as they could safely run concurrently. Then a later release would drop SHA1 cookies support altogether.

EDIT: See #26023

ClearlyClaire avatar Jul 17 '23 06:07 ClearlyClaire

I meant, if possible, continue to use SHA1 in the next release, but also support reading from SHA256 (which wouldn't be used in the next release).

I'll comment on #26023 to try to work through this more, but I didn't think that "continue to use SHA1 in next release" and "also read from SHA256" ... are actually possible at the same time?

My mental model was that there are sort of two discrete components:

  • What cookie digester is being used? This applies both to reading inbound cookies on each request, and also on writing/setting outbound on responses. In my head these were both keyed off of that key_digest_hash_generator_class setting, and could not be separate things from each other. Maybe that's not the right model and they can be different?
  • Is there a rotator in place? If so, that's going to apply to how inbound cookies are handled and whether they are actively modified on each request when they need to be, but it's a somewhat isolated issue from the digest generator class setting.

Will comment more on the PR.

mjankowski avatar Jul 17 '23 13:07 mjankowski

Still having users mention their 2FA is broken so its not like it fixes it over time in changing from Rails 6 to 7 workers. Was anything else found that could possibly be the issue for 2FA not working ?

WhiskeyOmega avatar Jul 20 '23 18:07 WhiskeyOmega

Still having users mention their 2FA is broken so its not like it fixes it over time in changing from Rails 6 to 7 workers. Was anything else found that could possibly be the issue for 2FA not working ?

We haven't seen any other report. If you get the consent of one of your affected users, can you try the steps listed in https://github.com/mastodon/mastodon/pull/25668#issuecomment-1635620561?

Also, have you possibly made unrelated changes to your .env.production?

ClearlyClaire avatar Jul 20 '23 18:07 ClearlyClaire