toga [WIP] Typing ergonomics and guarantees

Changes

Toga's API surface types are now compliant with mypy, except for:
- <enumerate exceptions here>
Closes https://github.com/beeware/toga/issues/2192

Notes

KI

https://github.com/python/mypy/issues/3004
https://github.com/pypa/setuptools_scm/issues/501

Original PR body

## Summary This is a proposal to introduce better typing ergonomics for users and ensure typing accuracy of the Toga API.

Strategy

Introduce tests that specifically exercise the types for the API's interface.

These tests are a little different than what pytest does; they aren't runnable code - instead, they simulate user code and ensure that mypy thinks the API conforms to the expected types.

For instance, this test confirms src accepts a str and size returns a tuple of two ints:

assert_type(toga.Image(src="/data").size, tuple[int, int])

Benefits

The API's typing will be compatible with mypy (and potentially other type checkers)
The API's typing will be accurate...at least insofar as it's typed
The API's typing will be confirmed stable over time

Considerations

Why not just make Toga's codebase pass mypy?
- This will be non-trivial, to say the least...especially given how strictly I want to run mypy against the API's interface
- I'm planning to enable every typing enforcement for mypy that I can for the API tests
- Also, this approach actually verifies the user's experience with the API
  - For instance, running mypy over the Toga's codebase won't tell you that src for toga.Image(src="...") should accept a Path....but running mypy over code using toga.Image can verify that
How will we know if the entire API's typing is checked?
- I'm not sure we can programmatically measure this....but it may be possible through other means 🤔

Thoughts, feedback, criticism welcome before I move forward

PR Checklist:

[ ] All new features have been tested
[ ] All new features have been documented
[X] I have read the CONTRIBUTING.md file
[X] I will abide by the code of conduct

Nov 30 '23 01:11 rmartin16

These are good points. I would like to focus on the abstract goals, though, instead of my quick POC badly attempting some sort of an implementation. So, let's assume the type hints for the actual API (instead of dummy or something else) can be effectively evaluated for this discussion.

Goal

Ensure Toga provides a usable and accurate typing experience for users.

Measuring achievement

This goal would be satisfied when a user application exercising arbitrary components of Toga allows mypy to run to success regardless of strictness settings (where possible and pragmatic - # type: ignore exists for a reason).

Strategy

Toga compliance with mypy

Why can't we use mypy on the test suite as compliance condition?

full mypy compliance in some parts of the repository

These points are obviously valid; my primary contention with them, though, is the burden it bestows on Toga development. Writing Python that's compliant with mypy (even without the strict settings) can be a frustrating experience. This is especially true if you consider some of the more polymorphic code in Toga. Additionally, if you want to encourage outside developers to contribute, typing has a particularly rough learning curve and contributions may be ultimately abandoned because of it.

That said, a combination of mypy compliance in the source and in the tests could achieve the goal. And it's important that we consider running mypy against actual usage of Toga (by evaluating the tests as well) because we want to confirm the actual typing is not just internally consistent but is actually what the API should advertise. In many cases, an API's compliance with mypy with satisfy both of these....but it is also possible to satisfy mypy with typing that doesn't reflect desired usage. This is more likely in complicated typing involving Generics, for instance.

Toga usage compliance with mypy

This is effectively what I'm proposing; that when Toga is used, the APIs satisfy mypy even if Toga doesn't itself. It is essentially a subset of the solution space of "run mypy over the tests" but it wouldn't require all of the tests to satisfy mypy.

Yes, if I understand correctly, this is only making assertions about the type hints themselves, but not whether those hints are actually being followed. It seems like a lot of work for no clear benefit to add hundreds of assertions that essentially say "an argument or return type hinted as X is hinted as X", but don't add any additional confidence that an X will actually be accepted or returned by the code when it's run.

This is a good point; in some of my examples in my POC implementation, the assertion is trivial and isn't likely to provide enough value to warrant implementing hundreds of them. (Although, I will note it provoked fixing (int, int) to tuple[int, int]'.) I think what this could suggest is a strategy that would be limited to less straightforward APIs that would also usually be polymorphic in nature.

However, it is also not necessarily the goal of this type checking to ensure the code complies with it. This is part of the balance I talk about in the next section.

Implementation

I see levels, I suppose, of implementation options. They all come with their own trade offs (some mentioned above) and ultimately, this may come down to finding the right balance of achieving the fullest heights of the goal with the pragmatic situation of reality and existing tooling.

Un-automated and ad-hoc type testing

I would consider this the 80/20 solution. Basically, you write a quick Toga app implementing all (or at least the complicated) APIs and ensure it can pass mypy. This is, more or less, how I've come to proposing a solution at all; I will have a Toga app open in PyCharm and I start seeing squiggles under my variables because the typing isn't right.

If nothing else, such an approach would head off many of the issues users would create. Moreover, though, a related concern is that users don't report them and instead, Toga just has a lower reputation for them.

Automated type testing without known coverage

Incorporate the idea above in to CI. I haven't seen any tooling that would allow us to know if we're actually testing everything. However, if that was the goal, then I think the recommendation would just be "run mypy on your code". So, I would consider the intention here to make sure sensitive or error-prone code is being exercised.

Toga type checking

This is the "true" solution of just getting Toga to pass mypy...which I've discussed.

Other comments

In particular, a quite common class of bugs is where the public API in the core returns an object it got from the backend, but the backend doesn't comply with the public hint (e.g. #1779). It would be nice if things like that could be caught by a static type checker, rather than needing an isinstance check next to every equality check in the testbed. This would require formalizing the core / backend interface, which is mostly duck-typed at present, with a set of type-hinted base classes, which is probably a good idea even if we don't end up running mypy at all.

Agreed. My approach here is more of an interim solution; however, it'd only be interim if someone actually does all this work to type Toga soup to nuts at some point.

I'm mostly targeting the immediate user experience of trying to use Toga with mypy or even in a modern IDE. To that end, if you don't have Pylance enabled in VS Code, I would recommend turning it on so at least the type issues it can find are elevated at dev time.

Final Thoughts

At the end of the day, this is not a silver bullet or even a pretty stopgap. The best experience for typing with Toga would be facilitated by fully implementing typing within Toga and ensuring the code and its tests pass mypy. That is no small order, though. So, this is mostly trying to find a middle ground where at least some of the typing for the API's interface can be guaranteed and user ergonomics for implementing type checking in their projects using Toga is as painless as possible.

Nov 30 '23 17:11 rmartin16

These are good points. I would like to focus on the abstract goals, though, instead of my quick POC badly attempting some sort of an implementation. So, let's assume the type hints for the actual API (instead of dummy or something else) can be effectively evaluated for this discussion.

To be clear - dummy is a valid backend - it's just not a backend most people will use in practice. It should match the same typing interface as a "real" backend.

Goal

Ensure Toga provides a usable and accurate typing experience for users.

Agreed - although I'd qualify that slightly by saying that while typing can be a useful mechanism for providing IDE hints and identifying some classes of bugs, it's not a panacea.

I'm interested in improving typing to the extent that it improves the lived experience of Toga developers (both those developing with Toga, and those developing Toga itself). I'm not especially interested in expending excessive effort just so we can metric of "type purity". Practicality beats purity.

Measuring achievement

This goal would be satisfied when a user application exercising arbitrary components of Toga allows mypy to run to success regardless of strictness settings (where possible and pragmatic - # type: ignore exists for a reason).

+1. +2 to the "pragmatic" part :-)

Strategy

Toga compliance with mypy

Why can't we use mypy on the test suite as compliance condition? full mypy compliance in some parts of the repository These points are obviously valid; my primary contention with them, though, is the burden it bestows on Toga development.

+1

Toga usage compliance with mypy

This is effectively what I'm proposing; that when Toga is used, the APIs satisfy mypy even if Toga doesn't itself. It is essentially a subset of the solution space of "run mypy over the tests" but it wouldn't require all of the tests to satisfy mypy.

It might be a lack of imagination on my part, but I'm having difficulty working out what this would look like. If Toga's test suite (core + testbed) is exercising API and every code path, I'm not sure I see how the tests could satisfy mypy if Toga doesn't.

Implementation

Un-automated and ad-hoc type testing

I would consider this the 80/20 solution. Basically, you write a quick Toga app implementing all (or at least the complicated) APIs and ensure it can pass mypy.

I can see how this would give us a quick win (or at least let us quickly identify problems that exist right now). My concern would be maintaining this sample app over time. I'm not wild about the idea of a new project that exists purely for testing purposes that isn't executed as part of automated testing except as part of mypy.

However, I wonder if there's a two-birds-one-stone solution here: We have toga-demo, but it is... not a very good demo :-) A much better demo would be a "widget gallery" - a live demonstrator of every widget that Toga contains (or, at least, a placeholder that says "this widget doesn't exist". This app could be uploaded to app stores, or made available for download. If it also gives us an opportunity to do a practical type check... ?

The reason we haven't historically done this is that we don't have a widget to use for the top-level navigation of such an app. We need the ability to display a list of widgets to demonstrate, which links to a page that demonstrates that widget.

Automated type testing without known coverage

Toga type checking

I'm not sure I understand why these two are in this order in terms of complexity. I would have thought internal type completeness in Toga would have been easier, as the external type surface isn't being tested.

Final Thoughts

At the end of the day, this is not a silver bullet or even a pretty stopgap. The best experience for typing with Toga would be facilitated by fully implementing typing within Toga and ensuring the code and its tests pass mypy. That is no small order, though. So, this is mostly trying to find a middle ground where at least some of the typing for the API's interface can be guaranteed and user ergonomics for implementing type checking in their projects using Toga is as painless as possible.

I'm definitely up for a "gradual improvement" approach. My concern is around ensuring that the approach we take is "self-sustaining" - that it doesn't involve adding something to a test suite that then starts to bitrot because there's no reason to ensure that it remains current and accurate.

Dec 01 '23 02:12 freakboy3742

I'm not sure I see how the tests could satisfy mypy if Toga doesn't.

I would have thought internal type completeness in Toga would have been easier, as the external type surface isn't being tested.

I think I should clarify my position here. I am saying 1) my stated goal can be achieved with only type completeness of Toga's API surface and 2) this is far more feasible than type completeness of Toga, at large.

The primary piece of prior art for this strategy is Python's typeshed itself. It provides type completeness for the API surface of many packages and allows downstream developers to implement them with total typing compliance of their own code. This is, of course, while many of the packages represented by the typeshed are not typed themselves at all.

Therefore, as made evident by typeshed, type completeness of an API surface is an abstract subset of type completeness for the entire API (and its supporting code base). When a developer runs mypy on their code that's using Toga, mypy doesn't recursively review the internals of all function calls to check if they are internally consistent about the types they are declaring at their API surface; mypy takes the declared types and assumes they are correct.

Finally, retroactively achieving type completeness of a large code base long in to its existence is likely to be a perilous mission. When you write Python code without typing in mind from the get go, you're likely to end up with code that isn't compatible with type checking via static analysis. A good example is Briefcase; I never finished typing all of Briefcase because it strongly leverages composable classes via mix-ins, incompatible subclassing, and several of the important data structures are loosely defined. So, I would have had to create all of these awkward Protocols to structurally define the larger objects in play and likely do a bunch of refactoring. (Of course, none of this intended to reflect badly on Briefcase...it's just that type completeness inherently constrains what you can do with Python.)

At any rate, I feel confident saying achieving type completeness of Toga, in general, would be a significant task...and also one I am unlikely to take on. To get a taste of this, just run mypy on select files in Toga, e.g. core/src/toga/app.py; some of the errors are trivial...others may require refactoring....then you ask yourself "why am I refactoring working code....what am I getting". Of course, as Malcolm pointed out, targeted implementations such as abstract base classes for backends almost certainly make sense.

I'm definitely up for a "gradual improvement" approach. My concern is around ensuring that the approach we take is "self-sustaining" - that it doesn't involve adding something to a test suite that then starts to bitrot because there's no reason to ensure that it remains current and accurate.

Yeah; I understand this as a competing priority. I think the reality is, unfortunately, that there will be bitrot either way; if we don't verify or test the API surface types, they will bitrot...and if we do verify them with some sort of awkward pseudo-test app that won't have any coverage, it will bitrot. We're then full circle with "make Toga type complete" which I think is likely to be a task of dubious merit.

I do like the idea of the demo app potentially serving a dual purpose to verify Toga's API typing. However, I don't think I'm keen to take that on as part of this.

So, given all this, I can probably commit, at least, to a survey of Toga's current API surface types with the aim to:

confirm existing typing
correct current typing gaps
identify areas that may require more attention

Dec 01 '23 15:12 rmartin16

I think I should clarify my position here. I am saying 1) my stated goal can be achieved with only type completeness of Toga's API surface and 2) this is far more feasible than type completeness of Toga, at large.

This all makes sense. I hadn't considered Typeshed as a point of comparison, and yeah - I can now see how an external API surface can be type complete without the internals being complete.

At any rate, I feel confident saying achieving type completeness of Toga, in general, would be a significant task...and also one I am unlikely to take on. To get a taste of this, just run mypy on select files in Toga, e.g. core/src/toga/app.py; some of the errors are trivial...others may require refactoring....then you ask yourself "why am I refactoring working code....what am I getting". Of course, as Malcolm pointed out, targeted implementations such as abstract base classes for backends almost certainly make sense.

And that's my general criticism of typing as an approach. There are times where it's very helpful. There are times when it is in direct conflict with the flexible-duck value proposition that is a core of Python's appeal.

I do like the idea of the demo app potentially serving a dual purpose to verify Toga's API typing. However, I don't think I'm keen to take that on as part of this.

So, given all this, I can probably commit, at least, to a survey of Toga's current API surface types with the aim to:

confirm existing typing

correct current typing gaps

identify areas that may require more attention

That seems like the best way forward to me. There's definitely stuff we can fix right now, but we don't need to swallow the full mypy elephant in one sitting; and we have a goal we can work towards in terms of a demo app (and any other areas that might be identified as part of a once-off type audit) that might provide a long term way to make mypy palatable.

Dec 02 '23 01:12 freakboy3742

So, given all this, I can probably commit, at least, to a survey of Toga's current API surface types with the aim to:

confirm existing typing

correct current typing gaps

identify areas that may require more attention

These all seem like worthwhile goals. You've probably worked this out already, but the type hints currently in the API were written mainly for the purpose of generating the documentation rather than being logically rigorous.

In fact, there were a couple of times where I deliberately made the hints looser because the "correct" form was unreadable in the docs. Type aliases are supposed to be the solution to this, but Sphinx doesn't seem to support them very well (e.g. https://github.com/beeware/toga/pull/1984#discussion_r1261112175).

Dec 02 '23 18:12 mhsmith

These all seem like worthwhile goals. You've probably worked this out already, but the type hints currently in the API were written mainly for the purpose of generating the documentation rather than being logically rigorous.

In fact, there were a couple of times where I deliberately made the hints looser because the "correct" form was unreadable in the docs. Type aliases are supposed to be the solution to this, but Sphinx doesn't seem to support them very well (e.g. #1984 (comment)).

Thank you for bringing up docs; I hadn't properly included them in my calculus. I will definitely incorporate them as part of the pragmatism of this implementation.

Dec 02 '23 20:12 rmartin16

Type aliases are supposed to be the solution to this, but Sphinx doesn't seem to support them very well (e.g. #1984 (comment)).

Want to mention this now that I've seen how fickle Sphinx is with rendering TypeAliases.

It turns out that Sphinx is very sensitive to how the TypeAlias is defined. Notably, a TypeAlias that's used in an API signature cannot use forward references or mixed definitions.

That is, "forward references" will not work:

ActionHandler: TypeAlias = "ActionHandlerType1 | ActionHandlerType2"

class ActionHandlerType1(Protocol): ...
class ActionHandlerType2(Protocol): ...

You need to ensure ordering to avoid the forward references:

class ActionHandlerType1(Protocol): ...
class ActionHandlerType2(Protocol): ...

ActionHandler: TypeAlias = "ActionHandlerType1 | ActionHandlerType2"

Furthermore, though, the TypeAlias definitions cannot be robust, if you will, if expressed as a string; for instance:

T = TypeVar("T")

class ActionHandlerType1(Protocol[T]): ...
class ActionHandlerType2(Protocol[T]): ...

ActionHandler: TypeAlias = "ActionHandlerType1[T] | ActionHandlerType2[T]"

I was only able to get this to work as:

T = TypeVar("T")

class ActionHandlerType1(Protocol[T]): ...
class ActionHandlerType2(Protocol[T]): ...

ActionHandler: TypeAlias = Union[ActionHandlerType1[T], ActionHandlerType2[T]]

Therefore, TypeAliases can be reliably rendered (ostensibly anyway) in the docs if avoid forward references and use explicit Unions.

P.S. You also must build the docs using Python 3.10 or later....which we always do anyway via .readthedocs.yaml.

Dec 03 '23 16:12 rmartin16

Therefore, TypeAliases can be reliably rendered (ostensibly anyway) in the docs if avoid forward references and use explicit Unions.

That specific exercise smells a bit like a bug to me.

However, this is also getting to the point where I'm not sure we're actually gaining anything by adding strong type declarations. If Toga's types are useful as a way to pass edge cases upstream to the people working on typing to diagnose bugs, then that's great - but I'm not wild about the idea of putting typing's bleeding edge on our critical path.

Dec 04 '23 00:12 freakboy3742

Therefore, TypeAliases can be reliably rendered (ostensibly anyway) in the docs if avoid forward references and use explicit Unions.

That specific exercise smells a bit like a bug to me.

These are all Sphinx bugs; there are several logged issues I found related to TypeAliases, forward references, etc. However, if you avoid using all these things together and just stick to basic typing and assignments, then this all works as expected for documentation; given that, I wouldn't necessarily consider this bleeding edge, per se....perhaps just a bit rough around the edges.

However, this is also getting to the point where I'm not sure we're actually gaining anything by adding strong type declarations. If Toga's types are useful as a way to pass edge cases upstream to the people working on typing to diagnose bugs, then that's great - but I'm not wild about the idea of putting typing's bleeding edge on our critical path.

Well, I think we are gaining something (other than typing purity, that is); we are getting useful and accurate typing for handlers...which was one of the primary impetuses for me to do this (i.e. #2192). I'd also argue this use case isn't really an edge case; right now, the typing for handlers doesn't support async (and, less importantly, generators).

More ideologically, though, I would put forth that inaccurate typing is worse than no typing. Most IDEs are easily configured (some even out-of-the-box) to surface typing issues for Python to the developer. Therefore, if Toga is going to type its public API, I think it should be accurate.

Now...I do not intend this to mean that every type has to be accurate to the extent of the true strictness of the underlying code...but it should still be accurate in its permissiveness. For instance, if we consider handlers, I've implemented an accurate (and complete) typing for them (that's also not overly burdensome, IMHO). However, if we think this goes too far with typing, then we should just type them as Callable; this is as accurate as you can be at this level while still truly representing the actual permissiveness of the code. (I'd have to confirm but Callable[[App], ...] would probably work in cases where positional arguments are used.)

(meta: we're starting to enter the "pragmatism" debate of this PR :) so, with that, I would like to try to establish some guiding principles to help us balance Toga's typing. Hence, I'm putting forth that irrespective of how "true" a particular type is, its permissiveness should always be accurate.)

Dec 04 '23 16:12 rmartin16

(meta: we're starting to enter the "pragmatism" debate of this PR :) so, with that, I would like to try to establish some guiding principles to help us balance Toga's typing.

+1

Hence, I'm putting forth that irrespective of how "true" a particular type is, its permissiveness should always be accurate.)

Agreed. Elaborating further on this, I'd give high level goals of:

Clarity (i.e., you shouldn't need a 3 year degree in semiotics to be able to read the Sphinx rendition of a type)
Accuracy (i.e., a type declaration shouldn't reject valid inputs)
Specificity (i.e., a type declaration shouldn't accept invalid inputs)
Completeness. (i.e., we'd rather a type not be declared, or be declared loosely, than declared in an incorrect or unclear way)

Balancing the relative priorities of those will be a bit of a balancing act, but they're at least principles to start with (or, at least, principles to start a discussion about on the way to a final set :-)

Dec 04 '23 22:12 freakboy3742

Note about the current state:

You may notice changes that aren't particularly in line with the stated goal of "typing the API surface of Toga". I'm currently experimenting with different functionality of mypy but I will most likely target an end state that will not include functional changes to code or type: ignore comments.

Dec 10 '23 01:12 rmartin16

The Return

I rebased this on main today. Originally, I was waiting on some refactors to continue this but I'm not sure where those are at. Nonetheless, I'm planning to make the push now to decide on how this will be accepted...or rejected (because keeping it up to date is quite a bit of work).

As the next step, I plan to put forth some of the bigger questions and issues with this approach. That should help us determine whether the current scope of typing in this PR is desirable for the project. From there, it should hopefully be straightforward (yet tedious) to review the specific changes.

Mar 26 '24 23:03 rmartin16

The Return

I rebased this on main today. Originally, I was waiting on some refactors to continue this but I'm not sure where those are at.

If you're referring to the app base class refactor - that's on the todo list for this quarter. The "final state" should be close(ish) to what is on #2244 right now; but as noted on that PR, it will likely be split up into 10-ish PRs to get there in a reviewable way.

The good news is that there aren't that many new APIs - and the ones that are new mostly follow familiar patterns. The most invasive change - reordering the classes - has already been merged to main.

Are there any other big refactors that you're talking about?

As the next step, I plan to put forth some of the bigger questions and issues with this approach. That should help us determine whether the current scope of typing in this PR is desirable for the project. From there, it should hopefully be straightforward (yet tedious) to review the specific changes.

Sounds like a plan. I definitely don't want you to burn a bunch of time on extensive typing changes if they end up not being merged, so making sure we've got a high-level agreement on the approach (and related issues) sounds like the best next step.

Mar 27 '24 00:03 freakboy3742

Are there any other big refactors that you're talking about?

I think it was mostly the app base class changes...but I also just got focused on other things.

I definitely don't want you to burn a bunch of time on extensive typing changes if they end up not being merged

I've spent most of my time on the API signatures themselves and I think we'll find agreement most of those changes should be merged at a minimum. As for the mypy compliance stuff, I've mostly added TODO comments to figure out what to do later.

Mar 27 '24 00:03 rmartin16

Summary

The primary goal of this PR is to ensure accurate typing for the API surface. Secondary goals are to ensure decent ergonomics of using Toga within a fully typed project as well as providing a mechanism to enforce typing accuracy in to the future.

The current implementation tries to achieve all of these goals by requiring Toga core to pass a somewhat strict configuration of mypy.

Implications

Developers using Toga gain robust typing assurances for their projects
Developers of Toga must understand and implement typing for all changes
- This will definitely be burdensome and present a barrier for new submissions for newer devs
There are several gaps in Python's typing system this project stumbles upon

Known Issues

Proper event handler typing is verbose and a bit complex
- Any handler can be sync, async, or a generator; therefore, all handler typing definitions require 3 Protocols to define each possible one and a TypeAlias to capture their Union for use in signatures
- A good example is the OnExitHandler near the top of app.py
- Using the current single Protocol implementation of handler typing does not support all possible types of handlers and will be flagged by mypy in users' projects
Property setters/getters are not well-supported
- https://github.com/python/mypy/issues/3004
- Basically, Python's typing system does not properly support property setter/getter where the accepted types for the setter do not match the type of the getter.
- This style is used prolifically in Toga and requires much of class initialization to have type: ignore.
  - A possible workaround to this is using type(self).<prop name>.fset(self, init_value)....which is entirely intuitive and great...
Python 3.8 (and maybe Python 3.9) does not support indexing in type annotations
- Many of the classes in collections.abc are in the typing....however, for ones that aren't, it is not possible to bound the types for those containers. For example, WeakValueDictionary.
Interactions with the underlying implementations are untyped
- Uses of _impl are nearly all inherently problematic without some form of unification of backend definitions
Right now, I've targeted typing accuracy over readability...so, can tone back some of these types (looking at you data sources) in a future iteration of this review if necessary

Questions

While not immediately pertinent, the use of Any vs object will need to be adjudicated
- In short, these both allow any object, but object limits how the object can then be used
Should -> None be enforced for implicit return values?
Type variables and aliases naming; should we append T or not to differentiate Types from Classes?

This is most of my high level thinking right now. The docs already need some work to use these TypeAliases...although, that's already the case (e.g. IconContent in toga.app.App [link]). I'm also interested in the questions or barriers you foresee for this work....ultimately, is the benefits of type compliance to users worth the additional development requirements?

Mar 27 '24 23:03 rmartin16

Summary

The primary goal of this PR is to ensure accurate typing for the API surface. Secondary goals are to ensure decent ergonomics of using Toga within a fully typed project as well as providing a mechanism to enforce typing accuracy in to the future.

Ultimately, I think this statement of priorities will end up being the root of most of the disagreement about this PR. From my perspective, developer ergonomics should come first.

Known Issues

Proper event handler typing is verbose and a bit complex

This is probably my biggest complaint - the end result of a "type correct" handler declaration is almost unusable from a documentation perspective. The OnPressHandlerT isn't a hyperlink when it's used in docs, and there are three, almost identical declarations for a Sync, Async and Generator handlers. While I accept this is technically correct from the perspective of a strict type checker, it completely fails as an aid to end-developers.

Is there's anything that can be done here with a decorator - i.e., if I define a single handler, and decorate it with a method that dynamically declares the other forms, will that satisfy mypy? How "static" is mypy's static analysis?

Property setters/getters are not well-supported

Yeah, this is a problem, even with the existing code. The docs are effectively incomplete, not declaring the full scope of what will be accepted. However, it matters a lot less because it's not unexpected for Python to duck type where it can.

As with the handler declarations, I wonder if a decorator could semi-automate the fset workaround?

Python 3.8 (and maybe Python 3.9) does not support indexing in type annotations

Many of the classes in collections.abc are in the typing....however, for ones that aren't, it is not possible to bound the types for those containers. For example, WeakValueDictionary.

Interactions with the underlying implementations are untyped

Agreed this is a gap - however, there is a protocol that the backends should be adhering to. It wouldn't be the worst idea for this protocol to be formalised. We're currently testing this by way of testbed runtime coverage, but we've had multiple occasions where we've missed updating the web and textual backends because they're not exercised by coverage.

Right now, I've targeted typing accuracy over readability...so, can tone back some of these types (looking at you data sources) in a future iteration of this review if necessary

As described above - I think this is a dial that needs to turned back the other way a bit.

Questions

While not immediately pertinent, the use of Any vs object will need to be adjudicated

In short, these both allow any object, but object limits how the object can then be used

Can you clarify what restrictions using object would enforce? In most of the places we're using Any at present, I think the reasoning has been "the value can be anything", which would seem to imply "...and we don't really know what is going to be done to it".

Should -> None be enforced for implicit return values?

If -> None is the default interpretation of MyPy, I'd be inclined to avoid the extra keyboard busy work.

I could probably also be convinced that if a method actually includes return None, then we should explicitly annotate that. A method that just falls off the end, or does a bare return as a shortcut, we don't annotate.

However, I'm also willing to be convinced otherwise on this.

Type variables and aliases naming; should we append T or not to differentiate Types from Classes?

This seems worthwhile to me - If you can't actually instantiate the type, flagging that makes sense.

This is most of my high level thinking right now. The docs already need some work to use these TypeAliases...although, that's already the case (e.g. IconContent in toga.app.App [link]). I'm also interested in the questions or barriers you foresee for this work....ultimately, is the benefits of type compliance to users worth the additional development requirements?

FWIW: a lot of the minor type fixes aren't controversial to me at all. As I've flagged previously, my only real concern with changes of this type is that without an automated mechanism for ensuring type correctness, these are going to drift out of correctness over time. However, even with that caveat, I think I'd be happy merging a version of this PR that had everything except the Handler type changes, the mypy # type: annotations (which are just noise if we don't have mypy in the stack), and the handful of TODOs flagged inline. I need to do a close review to be 100% certain, but from my initial pass, I don't see anything outside those two that seem especially problematic. The usage of generics in Sources (and related changes to Selection) is the only part that needs an especially close inspection AFAICT - I can't spot anything obviously wrong, but there might be something subtle in a deeper dive.

The bigger changes (like the triplicate handler definitions) are where you lose me. The bigger issue is the verbosity of the handler declarations, and whether we can find a mechanism that preserves documentation utility while also preserving type correctness.

Apr 02 '24 02:04 freakboy3742

From my perspective, developer ergonomics should come first.

Can you elaborate? From one perspective, inaccurate typing can really contribute to not being ergonomic for developers; where I started this was from my IDE complaining quite a bit while trying to use Toga. So, I think it'll be helpful in finding middle ground to understand which ergonomics you're considering a higher priority.

As for the larger picture, I don't think I'm likely to push back on a lot. There are already not insignificant documentation issues for types (e.g. IconContent, ImageContent, ImageT, etc. aren't linked...although, we may be able to find some solution) and current Python typing support has a lot of gaps that conflict with Toga's design/implementation.

So, in a lot of ways mostly what I'm asking right now is: this is what typing verification/enforcement would mostly look like....does this introduce a reasonable amount of overhead for what we're getting?

If not (and I'll be honest....I'm not entirely sold this is the way to go with the current state of Python typing), then we can drop typing verification/enforcement and just add the API surface type updates.

Apr 02 '24 02:04 rmartin16

From my perspective, developer ergonomics should come first.

Can you elaborate? From one perspective, inaccurate typing can really contribute to not being ergonomic for developers; where I started this was from my IDE complaining quite a bit while trying to use Toga. So, I think it'll be helpful in finding middle ground to understand which ergonomics you're considering a higher priority.

That's true of some developer tooling. VSCode doesn't complain about the current types - or, at least, it doesn't for me. I also haven't spent a lot of time trying to configure VSCode to do strong type checking everywhere, because my personal experience has been that strong typing in Python doesn't actually produce better code for the sort of code we're writing here - it produces a lot of headaches and accounting, without actually resulting in better code. I'm not especially disposed to changing our entire development process to quieten a specific subset of IDEs that are trying to make Python a strongly typed language so that their autocomplete mechanisms work better.

I do accept that typing notation is a very useful way to shorthand documentation - adding : int | None is a much nicer than repeating "accepts an integer or None" in a text description. However, for me, that fidelity stops when we spend more time trying to make the typing system happy than we do actually writing error free code.

Ultimately, I guess it really comes down to documentation. If I need to have a degree in semiotics to understand how to interpret the the typing annotation, then the typing annotation has failed in its task of effectively communicating.

For my money, that's what's happened with the Handler type. The current docs might not be strictly type correct, but the are pretty clear - a handler is a protocol describing a callable with some specific arguments. We're able to communicate externally that this also includes async and generator forms, with the same argument list. There's one entry in the docs index per handler type; and that value is linked as needed.

The new docs involve three nearly identical, but slightly different classes, none of which are linked from the place where they're actually used. It might be "type correct", but won't actually help anyone understand what is going on. The documentation is failing in it's role as a communication mechanism.

As for the larger picture, I don't think I'm likely to push back on a lot. There are already not insignificant documentation issues for types (e.g. IconContent, ImageContent, ImageT, etc. aren't linked...although, we may be able to find some solution) and current Python typing support has a lot of gaps that conflict with Toga's design/implementation.

Hrm... I thought they were (or, at least, I recall spending a lot of time trying to make them link and work)... if they're not, that's definitely something we need to look into.

So, in a lot of ways mostly what I'm asking right now is: this is what typing verification/enforcement would mostly look like....does this introduce a reasonable amount of overhead for what we're getting?

If not (and I'll be honest....I'm not entirely sold this is the way to go with the current state of Python typing), then we can drop typing verification/enforcement and just add the API surface type updates.

I think that's the crux of things - Python isn't a strongly typed language, and the way typing is being introduced piecemeal doesn't currently provide the environment where being pedantic about "strict validated typing" is actually helpful. I'm definitely up for making a bunch of minor corrections, and fleshing out the places where we've cheated by using callable rather than a protocol - most of which actually represent places where our current "documentation-based" typing is either incorrect or absent. Those changes all seem worthwhile; the larger automated goal... less so.

Apr 02 '24 07:04 freakboy3742

hmm...so, I think we agree that documentation should be clear...and we probably have more common ground than not that typing shouldn't compromise documentation. I think we disagree on how far we're willing to compromise the typing for the sake of documentation.

First, IMHO, I think you may be misrepresenting the value of accurate typing to users. I've had the benefit of real-time type checking since I started using Python and I believe it contributes significantly to helping write code right the first time and having readily available information about the arguments for an API; this is most notable for the stdlib as the typeshed has filled out over time. Furthermore, though, this isn't just about real-time type checking; this is also about type-enforced projects wanting to use Toga. If Toga's API typing isn't accurate, then static type analysis of its use will be a frustrating experience. I don't say this to mean this should be the highest priority irrespective...but that a bit more is being left on the table than IDE autocomplete.

Second, I probably should have considered my changes to the docs more carefully given how heavily it's being considered in this review. More than anything, I was just trying to get the docs to build while I updated the types. AFAICT, the primary concern is communicating semantically complex types to users while in reality the situation isn't really as complex as it seems.

So, hypothetically speaking, if the documentation for these types can be made satisfactorily pleasant, would their implementation be considered suitable? Or are these typing implementations inherently unsuitable? For instance, a simple synchronous version of a handler being shown in the docs while the actual typing represents the true situation.

Apr 03 '24 15:04 rmartin16

First, IMHO, I think you may be misrepresenting the value of accurate typing to users. I've had the benefit of real-time type checking since I started using Python and I believe it contributes significantly to helping write code right the first time and having readily available information about the arguments for an API; this is most notable for the stdlib as the typeshed has filled out over time. Furthermore, though, this isn't just about real-time type checking; this is also about type-enforced projects wanting to use Toga. If Toga's API typing isn't accurate, then static type analysis of its use will be a frustrating experience. I don't say this to mean this should be the highest priority irrespective...but that a bit more is being left on the table than IDE autocomplete.

That's fair; although I'd note that I've got 15-odd years of using experience using strongly typed languages prior to using Python (in which I've also got almost 20 years), and I can assure you that you can write plenty of wrong code even when you have strong types :-) If anything, adding types to Python concerns me specifically because it gives a certain type of developer a sense that their code "can't possibly be wrong, because it passes the type checker".

Type checking definitely has a place, and the larger a project, the more useful type checking will be. That's mostly because type checking requires formalisation of some of the implicit "duck" properties in an API that Python encourages. However, those duck properties are a good part of what makes Python attractive as a language, and many of them don't lend themselves well to formalisation, especially in the current state of typing. "Liberal in what they accept, conservative in what they produce" properties are a great example of this - trivial to explain, clearly a benefit to developer ergonomics, but near impossible to formally annotate. My concern is that, in achieving "formal type correctness", we either make the code prohibitively difficult to maintain, or we require that we lose features that are beneficial to the developer experience (including readability of documentation).

So, hypothetically speaking, if the documentation for these types can be made satisfactorily pleasant, would their implementation be considered suitable? Or are these typing implementations inherently unsuitable? For instance, a simple synchronous version of a handler being shown in the docs while the actual typing represents the true situation.

A qualified... maybe. Taking handlers as an example - assuming we can make the docs return something ergonomic, I'm still not wild that every handler definition requires three times as much code, most of which is almost entirely boilerplate reproduction pattern. However, if that's the price we need to pay to get automated verification, maybe I can learn to live with it...

Apr 04 '24 01:04 freakboy3742

I've re-positioned this PR just to update the types for the existing API surface. I decided to exclude handler typing so it can be addressed separately while landing the rest of this finally. At this point, I just need to ensure some of the more complicated typing is correct.

As a note, I discovered we need to be careful with the abstract types; notably, Iterable. It's an incredibly versatile type in Python but it is absolutely not safe to use it for variables that are coerced to a boolean.

For instance, this is a common pattern:

def foo(bar: Iterable | None = None) -> None:
    print(f"{bar=}")
    if bar:
        print("True!")
    else:
        print("False!")

>>> foo(bar=[1, 2, 3])  # True
bar=[1, 2, 3]
True!
>>> foo(bar=[])         # False
bar=[]
False!
>>> foo(bar=filter(None, [None]))  # ?
bar=<filter object at 0x77f910491db0>
True!
>>>

So, obviously, not all iterables coerce to a boolean like you might initially think. Therefore, these coercions should instead check for the None identity or use the Collection, Sequence, etc. types instead.

While I've addressed some of these cases already, I'm planning to go back through and probably update most of them to just check for None explicitly.

Jun 03 '24 23:06 rmartin16

I spent some time with Sources today.

My goal was to add typing generics to these classes so static code analysis could know the arbitrary types of the data inside ListSource, TreeSource, etc. However....their current implementations do not allow static analysis to infer the types.

If we consider ListSource, the issue stems from the handling of the user provided data which can be:

Iterable[Mapping[str, T]]
Iterable[Iterable[T]]
Iterable[T]

If you try to union these types, the type of T always devolves to Any because their individual "type spaces" (if you will) overlap. That is, an iterable of mappings is a conceptual subtype of an iterable of iterables. That can be avoided, though, if we require an iterable of collections instead. However, the fact that the user-provided data can also just be an iterable of objects....messes this all up and mypy can't figure out what's going on.

That said, if we ignore the "iterable of objects" case, then using Iterable[Mapping[str, T]] | Iterable[Collection[T]] to describe the acceptable user input does work and mypy can track the data types of the values in the data. So, that would at least address some situations...

A similar issue exists for TreeSource since a simple iterable of objects can create the tree.

Alternatively, I suppose, we could re-implement their designs.....and instead, perhaps, use specific constructors based on the type of input. These would obviously be pretty disruptive as well as a worse user experience...

Jun 07 '24 00:06 rmartin16

Alternatively, I suppose, we could re-implement their designs.....and instead, perhaps, use specific constructors based on the type of input. These would obviously be pretty disruptive as well as a worse user experience...

FWIW: I'm not fundamentally opposed to some redesign here if it will help - however, I suspect this is just one of those areas where we see the hard edges of strong typing. We're deliberately exposing an interface that is flexible, which is the exact opposite of what strong typing wants to do.

Jun 07 '24 00:06 freakboy3742

Alternatively, I suppose, we could re-implement their designs.....and instead, perhaps, use specific constructors based on the type of input. These would obviously be pretty disruptive as well as a worse user experience...

FWIW: I'm not fundamentally opposed to some redesign here if it will help - however, I suspect this is just one of those areas where we see the hard edges of strong typing. We're deliberately exposing an interface that is flexible, which is the exact opposite of what strong typing wants to do.

At this point, I'm thinking I'm just going to remove the ListSource and TreeSource typing from this PR. I was hoping to get to a final design for them today....but I don't really have one. So, instead of continuing to hold all of this up, I'll try to finalize what is ready to go tomorrow.

Jun 07 '24 00:06 rmartin16

The typing updates to the code are complete at this point; I removed the attempt to make ListSource and TreeSource generic. I do still need to assess any impact to documentation from the new TypeAliases or TypeVars, though..

Jun 08 '24 00:06 rmartin16

I reviewed all of the TypeAlias definitions and their documentation; they should all be in alignment now based on how the existing ones were documented. As such, the PR is done....now if the docs build will just be allowed to finish....

Jun 10 '24 22:06 rmartin16

Looks like the RTD problem is because Microsoft's web servers are either being flaky, or they've blackholed a URL we were using....

Jun 10 '24 22:06 freakboy3742

Just saw this in that last re-run for Windows...curious to see a stacktrace in the tests....

tests/widgets/test_table.py::test_remove_all_columns Traceback (most recent call last):
PASSED  File "D:\a\toga\toga\testbed\build\testbed\windows\app\src\app_packages\toga_winforms\widgets\table.py", line 176, in _new_item
    icon = icon(self._accessors[0])
  File "D:\a\toga\toga\testbed\build\testbed\windows\app\src\app_packages\toga_winforms\widgets\table.py", line 90, in winforms_retrieve_virtual_item
    e.Item = self._new_item(e.ItemIndex)
  File "D:\a\toga\toga\testbed\build\testbed\windows\app\src\app_packages\toga_winforms\libs\wrapper.py", line 22, in __call__
    return function(*args, **kwargs)
   at Python.Runtime.PythonException.ThrowLastAsClrException() in /tmp/build-via-sdist-h3j67_wc/pythonnet-3.0.3/src/runtime/PythonException.cs:line 53
   at Python.Runtime.Dispatcher.TrueDispatch(Object[] args) in /tmp/build-via-sdist-h3j67_wc/pythonnet-3.0.3/src/runtime/DelegateManager.cs:line 341
   at Python.Runtime.Dispatcher.Dispatch(Object[] args) in /tmp/build-via-sdist-h3j67_wc/pythonnet-3.0.3/src/runtime/DelegateManager.cs:line 208
   at __System_Windows_Forms_RetrieveVirtualItemEventHandlerDispatcher.Invoke(Object , RetrieveVirtualItemEventArgs )
   at System.Windows.Forms.ListView.OnRetrieveVirtualItem(RetrieveVirtualItemEventArgs e)
   at System.Windows.Forms.ListView.WmReflectNotify(Message& m)
   at System.Windows.Forms.ListView.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
              [ 84%]
tests/widgets/test_table.py::test_cell_icon PASSED                       [ 85%]

Jun 11 '24 01:06 rmartin16

Just saw this in that last re-run for Windows...curious to see a stacktrace in the tests....

Hrm... that's a new one for me - haven't seen that before.

Jun 11 '24 01:06 freakboy3742

As far as IconContentT is concerned, my understanding from reviewing the history was that you never were able to get the type annotation in the API docs to link to your c:type:: documentation....the best you could do is link to the ctype doc in the description of the argument itself.

I think I've ultimately landed on that for a lot of this a custom sphinx extension will be necessary to really achieve the desired balance between accurate types and comprehensible documentation.

Jun 11 '24 06:06 rmartin16