twine icon indicating copy to clipboard operation
twine copied to clipboard

Refactor/normalize exceptions

Open bhrutledge opened this issue 4 years ago • 11 comments

Looking over the custom exceptions in exceptions.py and searching for how they're used, I see a couple patterns. The first is an empty class, with the message provided at raise. The second is to define a message template in the exception, using arguments provided at raise with .from_args(...).

The empty class pattern is more common, but leads to duplicate, possibly inconsistent, and sometimes long message text in the code. To improve this, I propose:

  • [ ] Moving all exception messages to exceptions.py, making them templates where needed. I don't have a specific pattern in mind for this, and would prefer to follow one used by a larger project. This could mean replacing .from_args.

  • [ ] Adding subclasses of InvalidDistribution with a more specific message, for example:

    raise exceptions.DistributionFileNotFound(filename)
    raise exceptions.InvalidDistributionFormat(filename)
    
  • [ ] Combining UploadToDeprecatedPyPIDetected and InvalidPyPIUploadURL as suggested in https://github.com/pypa/twine/pull/509#discussion_r334291736

  • [ ] Looking for other opportunities to combine or extract exceptions. For example, could PackageNotFound be the same as DistributionFileNotFound?

  • [ ] Looking for opportunities to improve the exception messages. For example, somewhere I heard the concept of "help messages" instead of "error messages".

Happy to do this in multiple PR's.


PS: GitHub's code navigation is really handy. For example, when viewing exceptions.py, click a class name, see where it's referenced, and click one to go there:

Screen Shot 2020-04-15 at 6 38 15 AM

bhrutledge avatar Apr 15 '20 10:04 bhrutledge

Hi @bhrutledge

Thank you for opening this issue. I think this will also help in https://github.com/pypa/twine/issues/577.

Also I was wondering if this is something in which you are open to accepting PRs from external contributors? If that is the case, I would be happy to write up some PRs for the same.

deveshks avatar Apr 15 '20 16:04 deveshks

Thanks again for offering, @deveshks. I'd like to get feedback from the other @pypa/twine-maintainers before proceeding.

In particular, I'd like to settle on a pattern for encapsulating the error/help message in the exception classes. For example, pip seems to use __init__ arguments and __str__ in its exceptions.py.

bhrutledge avatar Apr 16 '20 00:04 bhrutledge

* Moving all exception messages to `exceptions.py`, making them templates where needed. I don't have a specific pattern in mind for this, and would prefer to follow one used by a larger project. This could mean replacing `.from_args`.

I am not fully sure of what this means. Do you mean that in addition to having all the exception classes in that file we'd have constants defined with exception strings that we then have to figure out? Something like raise exceptions.InvalidDistribution(exceptions.FILE_NOT_FOUND % filename)? I'm strongly against a pattern like that but it's not entirely clear. So far, the list isn't clear if it's all ideas building to the same goal or separate ideas for achieving/fixing different things that are bothering you.

* [ ]  Adding subclasses of `InvalidDistribution` with a more specific message, for example:

Absolutely this sounds good to me based on the usage you found. This is also backwards compatible

* [ ]  Looking for other opportunities to combine or extract exceptions. For example, could `PackageNotFound` be the same as `DistributionFileNotFound`?

I'm not sure, I'd have to carefully review the usage of either of them. One may be more an artefact of the old cheeseshop codebase, one could be more specific to preparing to upload things. I don't honestly remember.

Generally speaking, though, since we've started building a documented, backwards-compatible API into Twine, we can't make sweeping changes like combining or extracting exceptions without making the lineage of a given class very complicated and convoluted. So I'm tentative about the goal of "combining or extracting" exceptions. Also the goal of doing so isn't explicit.

* [ ]  Looking for opportunities to improve the exception messages. For example, somewhere I heard the concept of "help messages" instead of "error messages".

This makes sense. Some of our exception messages end up being in front of users. We should make the messages themselves more user friendly.


All in all, I'd still like to understand the motivation behind any of this. In general, if you weren't a member of the team, this would be a very jarring issue to receive. There's a list of things you'd like to do, but there's no reasoning as to why. What's the value you see in doing this work? How do you propose handling any problems that may arise for users of our (at this point, limited) API? Will this constitute a major version release of twine when done?

I need more information before I can say I'm :+1: or :-1: on the proposal

sigmavirus24 avatar Apr 17 '20 11:04 sigmavirus24

@sigmavirus24 Thanks for the feedback! I'm attempting to respond succinctly; happy to elaborate (via some sort of synchronous chat if desired).

I'd still like to understand the motivation behind any of this.

I can see how that's not obvious from my initial description. I've found the various patterns of exception handling to be confusing, both when reading the code, and when making/reviewing changes.

What's the value you see in doing this work?

I think it would improve the experience for maintainers, contributers, and users if error messages were defined in one place and raised with a simple call.

How do you propose handling any problems that may arise for users of our API? Will this constitute a major version release?

I started to use GitHub's code search to get a sense of if/how other projects use Twine's exceptions, but that was stymied by all the repos that have committed their virtual environment. I also feel like I've seen different opinions on the stability of Twine's API. But yeah, this feels like a major version release, possibly in conjunction with #587 and #511.

Do you mean that we'd have constants defined with exception strings that we then have to figure out?

Nope. I'm thinking along the lines of what pip seems to do:

class RedirectDetected(TwineException):
    def __init__(self, repository_url, redirect_url):
        super().__init__(
            f"{repository_url} attempted to redirect to {redirect_url}.\n"
            f"If you trust these URLs, set {redirect_url} as your repository URL."
        )


class DistributionFileNotFound(InvalidDistribution):
    def __init__(self, filename):
        super().__init__(f"No such file: {filename}")


class InvalidPyPIUploadURL(TwineException):
    def __init__(self):
        super().__init__(
            "It appears you're trying to upload to PyPI (or TestPyPI) "
            "but have an invalid URL.\n"
            f"You probably want: {DEFAULT_REPOSITORY} or {TEST_REPOSITORY}.\n"
            "Check your repository configuration."
        )


raise exceptions.RedirectDetected(repository_url, resp.headers["location"])
raise exceptions.DistributionFileNotFound(filename)
raise exceptions.InvalidPyPIUploadURL()

I'm using __init__ for brevity and to maintain compatibility with the except in main(), but I like the __str__ pattern that pip uses.

bhrutledge avatar Apr 18 '20 19:04 bhrutledge

I can see how that's not obvious from my initial description. I've found the various patterns of exception handling to be confusing, both when reading the code, and when making/reviewing changes.

How does it improve the experience for users? If users go searching for the message they're seeing, they'll only find the exceptions module, not the place using the message itself. That would be far more frustrating to me having to search for a message, then search for something else and likely not finding what's causing my problem. That adds maintenance burden because then I have to open an issue to get you to tell me why it's doing something I don't expect it to, and then you have to find what's going wrong several months or years later after you've forgotten why.

To be clear, I'm not categorically against this, but I haven't been convinced it's a good idea yet.

Nope. I'm thinking along the lines of what pip seems to do:

So exceptions in Python (generally speaking) accept a message string in __init__ and that's more-or-less the common denominator. Some accept additional arguments or keyword arguments. Many of pip's exceptions actually follow exactly that style. PipError inherits from Exception and most things just inherit from PipError without extra methods. Some of these classes do as you describe but pip's internals are as inconsistent as our own. I have to look up for every exception what I'm supposed to be passing in if I need to use it. That's awful.

I'm happy to settle on a single pattern for maintainability. In fact, as we think about the future Twine API, I think that there are aspects of what pip does that are very valuable. I think storing the relevant objects/references on the exception make it easier for a 3rd party developer to introspect the error or provide greater detail.

This discussion has mean leaning more towards the from_args class method style that was introduced at some point (I don't know when or by whom). I think what I'd most like to avoid is having a tonne of __init__ methods defined. Perhaps TwineException can have an __init__ that calls some predetermined method to populate the attributes we care about such that from_args can template the exception message and be a generic method on TwineException. Something might look like:

class TwineException(Exception):
    message = "An error was encountered while running Twine"

    def __init__(self, message, **kwargs):
        super().__init__(message)
        self._initialize_attributes(**kwargs)

    def _initialize_attributes(self, **kwargs):
        self.kwargs = kwargs

    @classmethod
    def from_args(cls, **kwargs):
        return cls(cls.message % kwargs, **kwargs)

class RedirectDetected(TwineException):
    message =  ("%(repository_url)s attempted to redirect to %(redirect_url)s.\n"
            "If you trust these URLs, set %(redirect_url)s as your repository URL.")

    def _initialize_attributes(self, repository_url, redirect_url, **kwargs):
        self.repository_url = repository_url
        self.redirect_url = redirect_url
        super()._initialize_attributes(**kwargs)

We could use __str__ do dynamically re-format the exception string every time it's needed but that seems inefficient.

sigmavirus24 avatar Apr 19 '20 13:04 sigmavirus24

I'm enjoying this discussion. 🙂

I'm happy to settle on a single pattern for maintainability.

Excellent!

How does it improve the experience for users?

This was a bit of a stretch, but I was thinking that defining exception messages in one place might help us make them more consistent/helpful.

If users go searching for the message they're seeing, they'll only find the exceptions module, not the place using the message itself.

Fair point. Maybe this could be mitigated by continuing to show the class name in the output. Or, by extending --verbose to show a traceback. That said, the extra hop from "search for message text" to "search for exception class" doesn't feel too big to me.

This discussion has mean leaning more towards the from_args class method style that was introduced at some point.

Looks like you added that in https://github.com/pypa/twine/commit/333a0a861d4a73efa922fa8c690bfbf75d6a8a41, and I followed it in https://github.com/pypa/twine/commit/e53c9b5ee051f4028f9d0218855bfb0b9c7d8381. However, I don't see the advantage of this over passing explicit arguments to __init__, which leads me to:

what I'd most like to avoid is having a tonne of __init__ methods defined.

Why is that? As you said earlier:

exceptions in Python (generally speaking) accept a message string in __init__ and that's more-or-less the common denominator.

To me, this is an argument for letting __init__ do the work. It also prompted me to go digging into how other packages define their exceptions, because I'm keen to not reinvent the wheel. Here are some examples that (at first glance) seem to rely consistently on __init__ and __str__:

Perhaps TwineException can have an __init__ that calls some predetermined method to populate the attributes.

In my previous comment, I was tempted to add something like TwineException.message that could be a class attribute or a @property, but it felt prematurely complex. Seeing the from_args and _initialize_attributes example has me feeling similarly.

bhrutledge avatar Apr 19 '20 19:04 bhrutledge

This was a bit of a stretch, but I was thinking that defining exception messages in one place might help us make them more consistent/helpful.

Don't downplay it. That makes sense to me.

Fair point. Maybe this could be mitigated by continuing to show the class name in the output. Or, by extending --verbose to show a traceback. That said, the extra hop from "search for message text" to "search for exception class" doesn't feel too big to me.

I think this is a healthy compromise, certainly. Then we also get the exact place that's causing the problem/confusion.

To me, this is an argument for letting __init__ do the work.

So message is literally the string explaining the reason for the particular exception, right? And there's definitely preceent for allowing arbitrary args and kwargs in the __init__ but what I really don't like is how Pip has exception with no common rhyme or reason. It adds cognitive overhead to trying to extend the code base. First, if you think an existing exception is appropriate for your new area of code, you have to go and figure out "What the heck am I passing to this exception class initializer?" A first pass could always be, to preserve some amount of momentum, to just pass in a string, i.e., raise InvalidDistribution("TODO: Replace me") and that will always work. (If I remember what I saw yesterday correctly) Some of pip's exceptions require arguments and what if the exception is semantically correct but you don't have every argument needed to pass in to that initializer? Do we modify the initializer to make that one optional and then each additional time make it optional?

If we go the __init__ route, while preserving the ability to pass in a message, then a new contributor could break our agreed-upon pattern of keeping exception messages co-located easily. If we go pip's route, then there's no fast way to develop the code which requires exceptions. Further, keeping in mind an API for Twine, having exceptions that are bespoke with their __init__ makes stubbing or mocking infuriating unless we do yet more work to provide test fixtures that Do The Right Thing (which is harder than it might otherwise seem).

from_args allows us to manage canonical error messages in one place (attached to the class even) which moves in the direction we both like.

I think overloading __init__ for every exception class also makes sub-classes harder to reason about. Let's say we had this pattern in place today. If we were to extend InvalidDistribution we'd likely be tempted to call super().__init__(...) (which is a different pet peeve of mine) but would the arguments in our child class always be valid for the parent class? Would we be stuck keeping arguments that are otherwise seemingly irrelevant to the exception at hand to maintain the semantic parent-child relationship? Frankly, I long for a way to compose exceptions in an elegant way rather than the sub-class/parent-child hierarchy that is the de facto standard in Python.

Also, I'm entertained that you've referenced the flake8 exceptions module since that was entirely my fault and I've regretted it since I released Flake8 3.0.


What if we compromise on this?

Let's define our __init__ methods with a pattern like this:

def __init__(self, message=None, *, fully_qualified_filename=None, ...):
     ...

This will mean that all exception parameters are passed as keyword-only arguments, message is optional so one can still write raise ExceptionClass("TODO: Use the best practice later") but also we can then do raise ExceptionClass(fully_qualified_filename=fqn, ... ) and let the __init__ or some method it calls do the heavy lifting of templating things, etc.

sigmavirus24 avatar Apr 20 '20 13:04 sigmavirus24

I long for a way to compose exceptions in an elegant way rather than the sub-class/parent-child hierarchy that is the de facto standard in Python.

Amen. Thanks again for participating in this exercise!

One can still write raise ExceptionClass("TODO: Use the best practice later") but also we can then do raise ExceptionClass(fully_qualified_filename=fqn, ... ).

This is appealing, though I think it allows someone to write raise ExceptionClass("TODO: Use the best practice later", fully_qualifed_name=fqn). That feels potentially confusing to implement and use.

If you think an existing exception is appropriate for your new area of code, you have to go and figure out "What the heck am I passing to this exception class initializer?"

I think this is an issue with any unfamiliar API. I think having explicit __init_- kwargs could mitigate this, e.g. via editor autocomplete.

If we go the __init__ route, while preserving the ability to pass in a message, then a new contributor could break our agreed-upon pattern of keeping exception messages co-located easily. It also makes sub-classes harder to reason about.

I've been thinking that exceptions with specific messages should not accept an arbitrary message. We could allow for things like raise InvalidDistribution("TODO: Replace me") by keeping those high-level exceptions empty. But, this is flake8/urllib3 pattern that you've regretted, though I'm curious how it's been an issue in practice.

Keeping in mind an API for Twine, having exceptions that are bespoke with their __init__ makes stubbing or mocking infuriating.

I don't follow this; could you elaborate?

This could be mitigated by continuing to show the class name in the output. Or, by extending --verbose to show a traceback.

I think this is a healthy compromise, certainly. Then we also get the exact place that's causing the problem/confusion.

Can you clarify what compromise you mean? Class name in the output, traceback in --verbose, or both?


With all this in mind, I'm still partial to this pattern. But, if you feel strongly about passing in an arbitrary message to any exception, I'm game for def __init__(self, message=None, *, ...).

class InvalidDistribution(TwineException):
    pass


class DistributionFileNotFound(InvalidDistribution):
    def __init__(self, *, filename):
        super().__init__(f"No such file: {filename}")


class InvalidPyPIUploadURL(TwineException):
    def __init__(self):
        super().__init__(
            "It appears you're trying to upload to PyPI (or TestPyPI) "
            "but have an invalid URL.\n"
            f"You probably want: {DEFAULT_REPOSITORY} or {TEST_REPOSITORY}.\n"
            "Check your repository configuration."
        )


raise exceptions.DistributionFileNotFound(filename=filename)
raise exceptions.InvalidPyPIUploadURL()

bhrutledge avatar Apr 22 '20 10:04 bhrutledge

This is appealing, though I think it allows someone to write raise ExceptionClass("TODO: Use the best practice later", fully_qualifed_name=fqn). That feels potentially confusing to implement and use.

I don't think we'll see this in practice.

I think this is an issue with any unfamiliar API. I think having explicit __init_- kwargs could mitigate this, e.g. via editor autocomplete.

Editor auto-complete solves the "what arguments are here?" question but not so much "what's the semantic meaning of these arguments?" That often takes looking at other places an exception is used or committing to excellent docstrings. And only some editors provide hints from doctsrings. Also not everyone uses an editor that provides auto-complete on the arguments so this argument inherently precludes folks who don't use exactly the same tools as us.

I've been thinking that exceptions with specific messages should not accept an arbitrary message. We could allow for things like raise InvalidDistribution("TODO: Replace me") by keeping those high-level exceptions empty.

So let's say there are two worlds (our current one, and the future one we're discussing here). In the current one, InvalidDistribution can have a few different reasons. Accepting the message there makes sense because there is no one specific message. In the future, this particular case goes away because we're discussing sub-classes, but will it always be obvious when we need to subclass an exception? Do we want an exception module with hundreds of classes for the sake of each class having an immutable message that gets templated from the arguments? That feels like overkill to me. It seems more reasonable that we'd continue the pattern we have now and re-evaluate every so often when we recognize there are a few ways of using a class and those should be subclasses for re-usability. Without being able to specify a message, we're going to be proliferating exception classes and that will become hard to reason about too.

I don't follow this; could you elaborate?

Let's say that I'm a 3rd party user integrating with twine via the API. If I know that twine.repository.Repository has a method that can raise an exception but I don't want to necessarily figure out every case where that can be triggered, I might do something like:

r = unittest.mock.Mock(autospec=twine.repository.Repository)
r.upload.side_effect = twine.exceptions.InvalidDistribution()

If our __init__ methods require arguments be passed in then our users have to divine what the types are and what the semantic meaning is to write a simple test. If they can instead just pass a message like:

r = unittest.mock.Mock(autospec=twine.repository.Repository)
r.upload.side_effect = twine.exceptions.InvalidDistribution("Testing what happens when twine raises an InvalidDistribution")

They can be very clear to future contributors what's happening here (if for some reason the test isn't already clear) or they can document that case closer to the code they're using to mock it. It's far from perfect.

Can you clarify what compromise you mean? Class name in the output, traceback in --verbose, or both?

Sorry about that. It was perfectly clear in my head ;). Specifically printing the traceback in --verbose's output.


Re:

class DistributionFileNotFound(InvalidDistribution):
    def __init__(self, *, filename):
        super().__init__(f"No such file: {filename}")

I'm a firm believer that we should save any parameters passed in on the class. Even if I'm just using Twine's already limited API, I have to parse a string (which we change, as proven by a recent PR) to find the information that's important to me and/or my users.

sigmavirus24 avatar Apr 22 '20 11:04 sigmavirus24

I was thinking about this more after my last commit, and was starting to come around to the from_args approach instead of the __init__ compromise, assuming you felt strongly about the arbitrary message and __init__ caveats. Now I'm pretty much sold, but maybe with a little bikeshedding:

class InvalidDistribution(TwineException):
    pass


class DistributionFileNotFound(InvalidDistribution):
    @classmethod
    def format(cls, filename):
        return cls(f"No such file: {filename}")


class InvalidPyPIUploadURL(TwineException):
    @classmethod
    def format(cls):
       return cls(
            "It appears you're trying to upload to PyPI (or TestPyPI) "
            "but have an invalid URL.\n"
            f"You probably want: {DEFAULT_REPOSITORY} or {TEST_REPOSITORY}.\n"
            "Check your repository configuration."
        )


raise exceptions.DistributionFileNotFound.format(filename)
raise exceptions.InvalidPyPIUploadURL.format()

Do we want an exception module with hundreds of classes for the sake of each class having an immutable message that gets templated from the arguments?

I wasn't expecting hundreds, but I was thinking of a class for each distinct message. But, it does make me wonder about something like this:

class InvalidDistribution(TwineException):

    @classmethod
    def no_such_file(cls, filename):
        return cls(f"No such file: {filename}")

    @classmethod
    def unknown_format(cls, filename):
        return cls(f"Unknown archive format for file: {filename}")

raise exceptions.InvalidDistribution.no_such_file(filename)

bhrutledge avatar Apr 22 '20 12:04 bhrutledge

I wasn't expecting hundreds, but I was thinking of a class for each distinct message. But, it does make me wonder about something like this:

So I was thinking further into the future, to be clear. It wouldn't be hundreds today.

Your InvalidPyPIUploadURL is why I'm worried about using a class method in particular (instead of having optional keyword-only arguments that are part of our initializers. We shouldn't have to call .format() here if there's nothing to add to the string (and frankly, .format() would be overloading the common case of str.format() in people's heads). That said, I'd argue that we should be passing in context to exceptions that, to the best of our ability, explain the exception. So in the case of InvalidPyPIUploadURL I'd expect us to pass in the invalid url (and include it in the exception message).

If we take the tact of starting just with __init__ with optional keyword-only arguments, we can always add in class methods or other ways of creating the class. I'll get over myself and call super().__init__ everywhere and just set the attributes (although we could always use attrs to make life better)

sigmavirus24 avatar Apr 23 '20 12:04 sigmavirus24