mypy Non-empty container types

Feature

It would be very helpful to have a way of specifying that a container type must be non-empty.

Pitch

As an example of why this feature would be helpful, consider the following function, average:

from typing import List


def average(numbers: List[float]) -> float:
    """Return the average of a non-empty list of numbers."""
    return sum(numbers) / len(numbers)

Rather than relying on exceptions for handling the case when the number of numbers is zero, is seems like something along the lines of the following would be more convenient and more explicit:

from typing import List, NonEmpty


def average(numbers: NonEmpty[List[float]]) -> float:
    """Return the average of a non-empty list of numbers."""
    return sum(numbers) / len(numbers)

I don't know anything about the way mypy is implemented so I'm not sure if this is a possible.

Nov 22 '20 19:11 AustinScola

This right here!

Came across this post, and wanted to see if I could do something similar using python`s types, without having to subclass, or similar.

Nov 22 '21 15:11 harahu

This would be very complicated to implement in full generality. Consider this code:

def f(lst: NonEmpty[list[float]]) -> None:
    lst.pop()
    average(lst) # is it still nonempty?

Nov 22 '21 15:11 JelleZijlstra

@JelleZijlstra I see your point. Maybe this shouldn't be allowed for a mutable container. But it still makes sense for, say a str, or what do you think?

EDIT: Realizing tuple was a bad example.

Nov 22 '21 15:11 harahu

An immutable container is easier, but it would still be a huge project. If you do str.strip(), does that give you another nonempty string?

Nov 22 '21 15:11 JelleZijlstra

An immutable container is easier, but it would still be a huge project. If you do str.strip(), does that give you another nonempty string?

This is not too different a case from how typing.NewType works today.

# 'output' is of type 'int', not 'UserId'`
`output = UserId(23413) + UserId(54341)

You should be pretty safe by letting the return type be the superclass, I'd imagine.

Nov 22 '21 15:11 harahu

@JelleZijlstra:

Having slept on it, I realize this has a lot more in common with NewType than I first though. In PEP484, the motivating for said type is the following (I`m quoting):

There are also situations where a programmer might want to avoid logical errors by creating simple classes. For example:

class UserId(int):
    pass

get_by_user_id(user_id: UserId):
    ...

This is quite close to what NonEmpty could look like, if breaking slightly from the pattern in the OP. Imagine a class NonEmptyStr(str). What would you want to change in such a subclass? Two things, essentially:

A validation at instantiation time, verifying that the value is indeed non-empty, to ensure you cannot do something like: NonEmptyStr(""), which should give greater run time safety than NewType, provided the object is immutable.
Optional changes to signatures of methods. NonEmptyStr("foo").title() could be annotated to return a NonEmptyStr, for instance, but if it isn't a default to just str (inherited) is still safe.

The first point is perhaps the most important, and is the main reason I don't reach for NewType more often today.

Generalizing this pattern, we see it pop up in other settings too: PositiveInt(int), Prime(PositiveInt), and so on.

Except for the need for validation, these are integers, and should behave as such in all cases. If I could annotate functions as needing a strict subset of ints, in a way that provides a runtime guarantee, I'd be all over it, provided it doesn't cost me performance outside of the validation.

Nov 23 '21 09:11 harahu

Not sure there is a term for this kind of subclassing. Often, when doing subclassing, one does so in order to equip the sub-class with a more powerful set of functionality. In this case we are doing sub-classing without adding more functionality, but rather to provide stricter guarantees for method and function types. Do you know if anybody makes such a distinction within type theory?

I'm realising that what I might want is a NewType equipped with validation, and tooling for overloading method singatures. At runtime, NonEmptyStr(some_str) could apply the validation function, without actually modifying the some_str object, such that you still have your plain old str, which means this is closer to a zero-cost abstraction. While at static analysis time, the type checker can safely treat this (actually a str) as of the narrower NonEmptyStr type.

Another perspective on this, is providing a concept similar to Literal, but usable for larger, finite, subsets and infinite subsets of a type. I mean in the sense that the type Literal["foo"] is a strictly analysis time subtype of str, and more useful than NewType because some checking is being done that you aren't making invalid claims, like foo: Literal["foo"] = "bar". What I want is being able to do the same for subsets I can't type out by hand. NonEmptyStr is just a special case of CompliantWithRegEx, in this sense.

I guess what I'm wondering is whether or not any of these ideas/perspectives provide a path to some or all of this functionality that isn't crazy costly to implement. If some of this does not sound like the ramblings of a madman, and you're kind of seeing where I'm going with this, I'd love some feedback on the feasibility of it, @JelleZijlstra. Do I need to go back to the drawing board?

Nov 24 '21 08:11 harahu

I can definitely see the use case for this! A related problem I'm thinking about is int types. In our codebase we often use ints to represent timestamps. We use a NewType for that (Utime), but if you do math on those (my_timestamp + 3 * DAY), the typechecker will just infer an int. A way to say that Utime + Utime = Utime would understand the program more precisely and catch issues where we confuse the Utime with another kind of int. I don't know what that would look like, though.

Nov 24 '21 15:11 JelleZijlstra

A related problem I'm thinking about is int types.

Highly relevant for us as well. Often want to annotate something as a StrictlyPositiveInt, and so on. Then, StrictlyPositiveInt + StrictlyPositiveInt -> StrictlyPositiveInt makes sense in the same way.

Ada has a related concept: https://learn.adacore.com/courses/intro-to-ada/chapters/strongly_typed_language.html#integers

Not sure I know exactly how this would be implemented either, but I think there is potential here for something that is both useful and a lot easier to implement than the originally proposed NonEmpty type.

Nov 24 '21 16:11 harahu

It seems that some form of specialization is required to express this. I could imagine some unified NewType and Protocol that specializes a type could be used to express these ideas.

Nov 24 '21 18:11 emmatyping

It seems that some form of specialization is required to express this. I could imagine some unified NewType and Protocol that specializes a type could be used to express these ideas.

I think you are on to something there! Would probably have to be a narrower kind of Protocol with no defaults or implementation allowed.

Just to mix in another already existing type: I imagine the runtime check part I'm missing with NewType could be achieved by providing a mechanism for optionally pairing your NewType with a TypeGuard that would result in a TypeError when trying NonEmptyStr(""), for instance.

Nov 25 '21 08:11 harahu

A NotEmpty type could be quite helpful to reduce the number of false-positives for the new possibly-undefined check and the pyright possibly unbound check.

/CC: @ilinum

def func(l: list[int]) -> None:
    for item in l:
        var = item + 4

    print(var)  # Name "var" may be undefined  [possibly-undefined]

This would be very complicated to implement in full generality. Consider this code:
def f(lst: NonEmpty[list[float]]) -> None:
    lst.pop()
    average(lst) # is it still nonempty?

I'm not sure it really matters. We could apply a strict interpretation where a container is only marked as NotEmpty if we know it contains at least one element. That might not cover all cases, but those could still use cast.

I.e.

def func(lst: NotEmpty[list[int]], lst2: list[int]) -> None:
    lst.pop()  # list[int]
    lst + lst2  # list[int]

    lst.append(2)  # NotEmpty[list[int]]
    lst + lst2  # NotEmpty[list[int]]

Dec 17 '22 17:12 cdce8p

/CC: @erictraut I think this could also help pyright.

Dec 17 '22 17:12 cdce8p

I just experienced a bug that NonEmpty[str] support could have avoided. Simplified example:

def stringify(value: int | None) -> str:
    return str(value) or ""

int.__str__ and NoneType.__str__ can never return an empty string. If they were typed as returning NonEmpty[str], then mypy could deduce that the or "" part of the conditional is never executed (dead code) and raise an error.

Because what was actually intended was str(value or "")

Mar 13 '23 14:03 intgr

Another category of bugs this could help is arithmetics between naive and tz-aware datetimes.

Mar 14 '23 00:03 ikonst

I've tried modifying datetime.pyi to add this behavior and quickly ran into the limitation of NewType requiring a concrete class, whereas datetime (as of Python 3.8) tries to preserve subclasses, so while I can do

+NaiveDatetime = NewType('NaiveDatetime', datetime)
+TzAwareDatetime = NewType('TzAwareDatetime', datetime)

 class datetime(date):
     ...
-    def replace(self: Self, ..., tz_info=_TzInfo | None = ..., ...) -> Self: ...
+    @overload
+    def replace(self, ..., tz_info=_TzInfo, ...) -> TzAwareDatetime: ...
+    @overload
+    def replace(self, ..., tz_info=None, ...) -> NaiveDatetime: ...

this would be a regression in subclass-preservation.

Back when PEP-593's Annotated was introduced, I thought it would enable us to overload-match on arbitrary traits (e.g. "tz-aware", "non-empty")

@overload
def replace(self: Self, ..., tz_info=_TzInfo, ...) -> Annotated[Self, TzAware]: ...

but at least the way the PEP intended and the way mypy implemented it, annotations appear to be stripped before any checking occurs.

Mar 14 '23 03:03 ikonst

@JelleZijlstra I realize I'm 2 years late, but are you aware of units-of-measure in F# (https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/units-of-measure)? "Types for Units-of-Measure: Theory and Practice" from Central European Functional Programming School 2009 might provide some ideas for how to implement that.

May 05 '23 14:05 pjonsson

I have a simple NonemptyString that I also use as a newtype base class. Eg, URL. https://github.com/public-law/open-gov-crawlers/blob/master/public_law/text.py

In Ruby, I have a nonempty Array which doesn't need to throw an exception: it initializes with a list head and tail. https://github.com/dogweather/non_empty_array

I believe that this general area is called Dependent Types. Lots of info in that topic.

Jun 26 '23 11:06 dogweather