typing
typing copied to clipboard
Generic strings/regex patterns in strings
Currently, we have str and LiteralStr. These don't do a good job expressing the range of values a string may have. Let's say we have the following function:
Email: TypeAlias = str
def send_email(email: Email) -> int:
# Send an email
...
# passes
send_email("[email protected]")
# passes
send_email("badddd")
# passes even if unknown_var is only identified as `str`
send_email(unknown_var)
We could improve it by having something like this:
# This is a bad regex pattern, but you get the idea :)
Email: TypeAlias = str['[a-z0-9]+@[a-z0-9]+\.[a-z0-9]+']
def send_email(email: Email) -> int:
# Send an email
...
# passes
send_email("[email protected]")
# fails
send_email("badddd")
# passes even if unknown_var is only identified as `str`
send_email(unknown_var)
A static type checker would be able to validate strings passed in by code. I would imagine this idea can be extended to Pattern and Match generics as well, but I haven't thought too deeply about them yet.
If we don't want to make str generic, we could add a new type to typing called StrPattern. We would need it anyways to backport to typing_extensions.
There's kind of some prior art in Typescript: https://www.typescriptlang.org/docs/handbook/2/template-literal-types.html
It's not quite as powerful as this regex idea though.
The TypeScript feature allows more than type checking though -- IIRC it allows constructing new (string literal) types from other strings.
I personally don't think a feature to let type checkers use regex matching on literals is all that useful -- I'd rather use Email = NewType(str) and leave the validation to runtime code that the type checker doesn't have to be understand. (It's likely that you already have an email validation routine in your system, and it may not be easy to replicate its exact functionality as a regex.)
For reference, Pydantic (V2) supports & validates such regex-constrained, annotated strings: https://docs.pydantic.dev/2.4/api/types/#pydantic.types.StringConstraints
To add another detraction to this, regex in annotations to be checked statically is probably not a great idea for similar reasons that type checkers don't evaluate expressions. Pathological patterns, maliciously added or not, could lead to ReDoS in many regex engines, including the one in python's standard library. This could lead to merely checking out a PR causing analysis to take undue time if not specifically using a more constrained regex engine.
Yeah I've been thinking of scaling this down quite a bit and following TypeScript's types in string template style.