cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-92613: Deprecate other uuencode functionality per PEP 594 & document as such

Open CAM-Gerlach opened this issue 2 years ago • 13 comments

Per #92613 , for the following functionality deprecated by PEP 594 (PEP-594) but not formally implemented due to an oversight:

  • Low-level interfaces binascii.a2b_uu and binascii.b2a_uu for decoding and encoding uuencode data
  • The uu_codec binary transform in the codecs module, implementing uuencode as a Python codec
  • Support for decoding non-MIME uuencode payloads with the get_payload(decode=True) method of the legacy email.message.Message (compat32) API

This PR adds missing:

  • DeprecationWarnings in the Python/C code
  • Warning checks and handling in the tests
  • Deprecation notices in the docs, and
  • A descriptive entry in the What's New for 3.11

Fixes #92613

  • Issue: gh-92613

CAM-Gerlach avatar May 13 '22 03:05 CAM-Gerlach

@warsaw , I've tagged you on behalf of the email team for review of this PR's approach to handling the deprecation of uuencode in the legacy email.message.Message compat32 API, and @pablogsal , I've tagged you as release manager for 3.11, to review adding necessary deprecation messages to a couple final uuencode-related items that were explicitly slated by PEP 594 to be deprecated in 3.11 and to be removed in 3.13, and not yet formally documented as such (both per the recommendation of @brettcannon).

CAM-Gerlach avatar May 13 '22 22:05 CAM-Gerlach

@warsaw @python/email-team Any chance you could take a look at this?

To provide some more detail about the potential for a backward-compat impact of the email change, it only affects one specific method of the legacy email.message.Message object using the old compat32 policy, only when the non-default decode=True argument is passed (which was dropped in the modern email.message.EmailMessage API), and then only when the message body contains an attachment encoded with the obsolete, non-MIME uuencode codec. In this case, a clear DeprecationWarning is raised, so if this is an actual issue affecting modern usage, it can either be fixed, or we'll hear something about it over the next two and a half years and decide whether the deprecation should be postponed or reverted.

Once the actual removal occurs, get_payload would either pass the payload back without uudecoding, as it currently does for other unrecognized, obsolete or non-standard encodings, or could raise a warning or error if a uuencode content type was detected, for additional visibility.

Doing a bit more research, a grep.app search for .get_payload(decode=True) on grep.app returns 400 total hits. However, examining the first 40 results by hand, 24 were vendered/downstream copies of the email module and its test suite, 8 were tutorials or example code and 1 was Python 2-only, leaving only 7 that could theoretically be affected (extrapolated out to ≈70 total).

Furthermore, of those, nearly all are appear to be oriented toward processing new email, which is very unlikely to be receiving attachments encoded in a 1980s-vintage encoding—and in case they do, the uuencoded text will still be preserved, so they can just paste it into a uuencode decode website or program, which IIUC is exactly what people did back in the days when uuencode was still relevant; before MIME replaced it, there was no such thing as "attachments", and embedding uuencode data into MIME messages has always been non-standard.

CAM-Gerlach avatar Jun 11 '22 01:06 CAM-Gerlach

Furthermore, of those, nearly all are appear to be oriented toward processing new email, which is very unlikely to be receiving attachments encoded in a 1980s-vintage encoding

This invokes the use case I'm worried about. Processing new mail as you say shouldn't use this old encoding, but processing old emails in archives could be broken by removing this. This could happen in e.g. Mailman if you had really old archives, you wouldn't be able to parse their attachments. OTOH, maybe I'm worrying too much about that. @maxking or @msapiro can you chime in?

warsaw avatar Jun 26 '22 00:06 warsaw

The other thought is whether it's worth it to inline the uu functions in the email package, in the same way as was done for determining the subtype for audio and image data. It might be too much baggage to keep carrying around, even if made private.

warsaw avatar Jun 26 '22 00:06 warsaw

Ah. good point—in my grep.app searching, the great majority of the use cases I found were processing new (ish) email, as opposed to email archives where this could theoretically be a problem, but I didn't think about Mailman:

  • In a search for get_payload in Mailman Core, only one instance used decode=True, and that was in the test suite for an example (standard MIME) message, which would not be affected. Otherwise, all instances did not pass decode=True so the affected code path could not be invoked.
  • HyperKitty had three instances, none of which used decode=True
  • django-mailman3 had one hit for get_payload, which did not use decode=True.
  • Postorius, MailmanClient, Mailman bundler and mailman-hyperkitty had no usages, as expected. I didn't see anything else that looked particularly likely or critical.

So, unless there's something I missed, it looks like Mailman is not affected—since it is a very specific code path that needs to be hit to even have the possibility to trigger the deprecated behavior. If there are other users, we should hopefully discover them during the >two year period when DeprecationWarnings will be issued. However, if you feel its best to inline the functionality (which we can always do at any point prior to the final removal, if it has significant real-world impact, I'd be happy to help with that.

CAM-Gerlach avatar Jun 26 '22 01:06 CAM-Gerlach

@warsaw wrote:

This could happen in e.g. Mailman if you had really old archives, you wouldn't be able to parse their attachments. OTOH, maybe I'm worrying too much about that. @maxking or @msapiro can you chime in?

It seems to me that the only place where this might be an issue is in hyperkitty_import of an old mbox, but hyprerkitty_import does not use a compat32 policy. It retrieves messages as raw bytes from the mbox and parses them into a message object with message_from_bytes(msg_raw, policy=policy.default). I can't think of anywhere else where we're dealing with old mboxes that might have non-MIME uuencoded data.

msapiro avatar Jun 26 '22 04:06 msapiro

So are we chopping this out in 3.11 (if we still can), taking it out in 3.12, or leaving it behind?

brettcannon avatar Aug 05 '22 22:08 brettcannon

I assume its too late for 3.11 now that rc1 is almost upon us (especially given how things have gone), so I imagine a push to 3.12 (with removal in 3.14) is the only practical solution.

Per my discussions with @warsaw here and on Discord as well as the results of several rounds of analysis of existing code bases, the backward compat impact appears to be very minimal at best, and the deprecation period would give us time to inline the relevant code or revert specific parts of the change in case it was a significant enough real-world issue.

Assuming we agree on moving forward, I can bump it to the 3.12 What's New instead and update the various bits that reference the versions accordingly.

CAM-Gerlach avatar Aug 05 '22 23:08 CAM-Gerlach

@CAM-Gerlach yep, it's too late for 3.11.

brettcannon avatar Aug 10 '22 20:08 brettcannon

I can bump it to the 3.12 What's New instead and update the various bits that reference the versions accordingly.

I've pushed a commit updating the various deprecating warnings, documentation messages, NEWS item and What's New content to target 3.12 as the deprecation version and 3.14 as the removal version.

CAM-Gerlach avatar Aug 11 '22 04:08 CAM-Gerlach

I re-rebased this again to fix the merge conflict and update to the latest main (since I'd already rebased it before for the same reason, and there was no outstanding review activity since then, so it wasn't going to do any further harm—per this repo's standard workflow, I now avoid rebasing and rebase-updates on PRs once they have been reviewed, absent special circumstances, but the previous rebase was before I'd adapted to this).

@brettcannon @warsaw would you like to review this, so we can start evaluating any potential impact of these deprecations as early as practical (if needed, I can undeprecate or revise the deprecation strategy here based on feedback, either before or after the 3.12 release). Thanks!

CAM-Gerlach avatar Nov 08 '22 20:11 CAM-Gerlach

Sorry for the force-push; in the last commit I accidentally committed several completely unrelated local test scripts that were untracked in my working tree, so I had to prune those out of the history.

CAM-Gerlach avatar Nov 08 '22 21:11 CAM-Gerlach

I consider this primarily an email team call at this point.

brettcannon avatar Nov 09 '22 21:11 brettcannon

See issue #100308 for a non-email use of uuencoding. This usage is for a newly generated file, and is a non-email example.

ericvsmith avatar Dec 22 '22 22:12 ericvsmith

While it doesn't necessarily capture proprietary-only uses, a grep.app search for the binascii (a2b|b2a)_uu functions which searches all of GitHub only found 76 hits, of which all but ≈12 were just vendored copies of the stdlib, and at least half of the remaining were in tests, of stdlib or third-party code (and one was for EDGAR).

Also, for reference, a grep.app search for the already-deprecated import uu, there are 46 hits for import uu (plus 1 for from uu), of which only ≈7 are not vendored copies of the stdlib, and most are test code (and 1 for EDGAR). In general, these numbers are much lower than the 100s+ of uses of other modules deprecated by PEP 594). By comparison, there are around 25 000 + 6000 hits for import base64 + from base64.

If the functionality is still useful, you could consider maintaining a PyPI package with it, which would give you much more flexibility to fix issues like the one you mentioned, and deploy fixes and enhancements much more quickly than with a whole new Python feature version. I'd be willing to help advise you on what code would need to be moved, and how you might most easily structure it, as well as suggest migration advice so that existing use cases can move to it seamlessly with minimal or no changes.

CAM-Gerlach avatar Dec 22 '22 23:12 CAM-Gerlach

.. deprecated-removed:: 3.12 3.14

We need to decide on this PR and its parent issue because 3.12 release is approaching.

arhadthedev avatar May 25 '23 05:05 arhadthedev

See also:

  • PR #104932
  • PR #104773

vstinner avatar May 25 '23 14:05 vstinner

All modules scheduled for removal by PEP 594 have been removed in Python 3.12 and Python 3.13. The last batch was done in https://github.com/python/cpython/issues/104773 especially the uu module.

@CAM-Gerlach: Would you mind to solve the conflicts on your PR?

vstinner avatar May 26 '23 13:05 vstinner

We need to decide on this PR and its parent issue because 3.12 release is approaching.

The main branch is now Python 3.13. Honestly, IMO it's ok to keep this code a little bit longer, it has been there for maybe 30 years. 1 or 2 more Python releases is fine. At the same time, I'm also fine with deprecating it in Python 3.12 beta2, but you might ask to Release Manager to validate that.

vstinner avatar May 26 '23 13:05 vstinner

Oh. I wasn't aware that the email package was using the binascii.a2b_uu() function. Is it really important to remove that feature of the email module? Maybe we can only deprecate binascii.b2a_uu() and keep binascii.a2b_uu()?

vstinner avatar May 26 '23 13:05 vstinner

While the uuencode format may no longer be used in the wild today, using the email module to parse of old archives old mailing lists or old Maildir files remain an interesting use case, no?

binascii_a2b_uu() is about 86 lines of C code, whereas Lib/uu.py was about 216 lines of Python code. The uu module removal reduced the Python maintenance burden. But maybe here is balance is more towards supporting reading old email backups?

vstinner avatar May 26 '23 13:05 vstinner

We need to decide on this PR and its parent issue because 3.12 release is approaching.

That all should be updated to 3.13 -> 3.15 if we decide we want to do this.

@CAM-Gerlach: Would you mind to solve the conflicts on your PR?

I would, yeah, and update the various versions that need to be bumped again, but I'd like to hear at least a +0.5 from @warsaw or the @python/email-team before going ahead with all of it.

At the same time, I'm also fine with deprecating it in Python 3.12 beta2, but you might ask to Release Manager to validate that.

I thought it was policy to not introduce new deprecations during the beta period, but maybe I'm wrong here? IMO, if we're going to do this (which I still think we should), it would be best to push it to Python 3.13 given we want to give as much time as practical for users to report any issues which might cause a reconsideration of the various deprecations/planned removals (and conversely, makes a lot more comfortable going ahead with this now despite a few potential corner cases where this might still be used).

Oh. I wasn't aware that the email package was using the binascii.a2b_uu() function. Is it really important to remove that feature of the email module? Maybe we can only deprecate binascii.b2a_uu() and keep binascii.a2b_uu()?

See the extensive discussion above—it's only actually ever used if a non-default value is passed to an optional parameter of one method of one class in the legacy email interface, and would only cause a non-fatal degradation in content for extremely old messages (uuencded attachments would not be decoded automatically). IMO, I'm not sure its worth just removing one of the two parallel methods just for that—is there a precedent? If there is significant usage, they will get a clear deprecation warning, and if it is an important-enough problem, we can just revert that part of the deprecation.

However, if there isn't the risk tolerance for that, then it seems we may as well just leave both.

While the uuencode format may no longer be used in the wild today, using the email module to parse of old archives old mailing lists or old Maildir files remain an interesting use case, no?

Yeah, that's something we looked carefully at, but Mailman, Hyperkitty, Postorios, and related mailing list tools had zero affected usages and of the few usages in the wild of that particular legacy method, only a very small number could even theoretically be affected, and all those we found were oriented toward processing new email rather than archives. And the worst that would happen is those non-standard "attachments" would be displayed as plain text and decoded manually, just like people used to have to do back in uuencode's day.

And again, if this is a significant concern based on response to the deprecation warnings (and we could perhaps even roll out user-visible warnings if necessary), we could either provide clear guidance to putting the code on PyPI, postpone the removal, or un-deprecate that.

CAM-Gerlach avatar Jun 03 '23 06:06 CAM-Gerlach