edx-platform icon indicating copy to clipboard operation
edx-platform copied to clipboard

fix: replaced md4 with blake2b

Open Anas12091101 opened this issue 10 months ago • 8 comments

Description

This PR replaces md4 with blake2b in memcache. At MIT, we recently encountered an issue after upgrading our base image from a debian-buster image to a debian-bookworm image. This uncovered an incompatibility between the newer version of openssl and one of the hash digests that edx-platform is using (1.1.1d-0+deb10u7). Basically, openssl + hashlib are saying that md4 is supported and available but if you actually try to invoke openssl md4 you’ll get an error saying it is not available. This PR resolves this issue by updating the hash function to something a bit newer and more widely supported.

Useful information to include:

  • Which edX user roles will this change impact? Learner, Course Author

Supporting information

https://discuss.openedx.org/t/openssl-upgrade-md4-hashing/12654

Testing instructions

  • In the lms container run python manage.py lms shell
  • In the python shell run:
>>> import hashlib
>>> blake2b = hashlib.new("blake2b", digest_size=16)
>>> string = "abc"
>>> blake2b.update(string.encode('utf-8'))
>>> blake2b.hexdigest()
  • You should get the hash without any errors in the above statements.

Deadline

"None"

Other information

Include anything else that will help reviewers and consumers understand the change.

  • Relevant openssl issue: https://github.com/openssl/openssl/issues/21247

Anas12091101 avatar Mar 28 '24 18:03 Anas12091101

Thanks for the pull request, @Anas12091101! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

openedx-webhooks avatar Mar 28 '24 18:03 openedx-webhooks

@Anas12091101 I know this caused problems with launching the gradebook MFE. Do you know what else in edx-platform might use this caching scheme?

pdpinch avatar Mar 28 '24 18:03 pdpinch

My only concern is that this will have the effective of clearing the entire cache, potentially causing a disruption. (On edx.org we only clear memcache as a last resort, as it logs out all sessions and causes a performance hit.)

So, a few thoughts:

  • Is there some option for a rotation mechanism? Fetching two keys (one blake2b, one md4) would be a much higher traffic burden, but maybe there's something clever we can do.
  • Would it be possible to just use a different implementation of MD4? MD4 is a weak hash function for cryptographic purposes, but should be fine for sharding.

timmc-edx avatar Mar 28 '24 20:03 timmc-edx

@Anas12091101 I know this caused problems with launching the gradebook MFE. Do you know what else in edx-platform might use this caching scheme?

@pdpinch I think by default in LMS and CMS memcache is being used as the caching scheme in edx-platform. So anywhere in edx-platform where we are using cache.get, we are calling the safe_key fn in the memcache.py

Anas12091101 avatar Mar 29 '24 07:03 Anas12091101

One option would be to put this in a try/except block catching the ValueError that returns when md4 is not a supported algorithm and falling back to Blake2B. That doesn't solve for the case when deploying a new set of instances with the newer OpenSSL that no longer supports that method.

On the note of using a different implementation of md4, that will give a stable target (for now) but introduces the potential for stagnation, as that algorithm is (rightfully) not supported any more. That leaves us in the situation of either accepting a one-time hit of dumping cached items or developing a migration path.

All of that being said, the hashed key is only used if the combined values of key, key_prefix, and version exceeds 250 characters. Given that the cache is designed to be a performance enhancement, not a functional requirement and that only a subset of cached values will be impacted I think that this should be a safe change to merge without developing a complex stack of migration logic.

blarghmatey avatar Mar 29 '24 16:03 blarghmatey

Good to know this only affects larger keys. I might put in a quick PR to add telemetry for checking what key sizes we're seeing on edX.

I think it might make sense to put this behind a feature switch and make a DEPR out of removing the md4 option entirely. Switching to blake2b as a default seems like a very reasonable thing to do, but having a well-defined timeframe would be helpful here.

From personal experience, while caches are nominally a performance enhancement, a full cache can very quickly become a critical requirement for system stability. I'd like to understand which situation we're in here before saying that I'm comfortable with a hard cutover.

timmc-edx avatar Apr 01 '24 00:04 timmc-edx

@timmc-edx what do you think?

pdpinch avatar Apr 03 '24 00:04 pdpinch

Looks good! The feature flag would allow us to schedule a time to do a cutover in a controlled way (and would give us time to research which keys would be affected).

timmc-edx avatar Apr 03 '24 13:04 timmc-edx

@Anas12091101 would you mind squashing your commits?

pdpinch avatar Apr 18 '24 15:04 pdpinch

@timmc-edx we took the route of adding a feature flag to ease the changeover. Is this OK to merge from your perspective?

After this merges, we'll open a DEPR for removing the md4 code and the feature flag, but it wouldn't be removed until after the Redwood or Sumac releases, depending on timing.

pdpinch avatar Apr 18 '24 15:04 pdpinch

Yes, I like this approach, and have no reservations about it merging.

timmc-edx avatar Apr 22 '24 02:04 timmc-edx

@Anas12091101 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

openedx-webhooks avatar Apr 22 '24 18:04 openedx-webhooks

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

edx-pipeline-bot avatar Apr 22 '24 23:04 edx-pipeline-bot

2U Release Notice: This PR has been deployed to the edX production environment.

edx-pipeline-bot avatar Apr 23 '24 00:04 edx-pipeline-bot