typing icon indicating copy to clipboard operation
typing copied to clipboard

Generic strings/regex patterns in strings

Open Fidget-Spinner opened this issue 3 years ago • 8 comments

Currently, we have str and LiteralStr. These don't do a good job expressing the range of values a string may have. Let's say we have the following function:

Email: TypeAlias = str

def send_email(email: Email) -> int:
    # Send an email
    ...

# passes
send_email("[email protected]")

# passes
send_email("badddd")

# passes even if unknown_var is only identified as `str`
send_email(unknown_var)

We could improve it by having something like this:

# This is a bad regex pattern, but you get the idea :)
Email: TypeAlias = str['[a-z0-9]+@[a-z0-9]+\.[a-z0-9]+']

def send_email(email: Email) -> int:
    # Send an email
    ...

# passes
send_email("[email protected]")

# fails
send_email("badddd")

# passes even if unknown_var is only identified as `str`
send_email(unknown_var)

A static type checker would be able to validate strings passed in by code. I would imagine this idea can be extended to Pattern and Match generics as well, but I haven't thought too deeply about them yet.

If we don't want to make str generic, we could add a new type to typing called StrPattern. We would need it anyways to backport to typing_extensions.

Fidget-Spinner avatar May 31 '22 14:05 Fidget-Spinner

There's kind of some prior art in Typescript: https://www.typescriptlang.org/docs/handbook/2/template-literal-types.html

It's not quite as powerful as this regex idea though.

henribru avatar May 31 '22 15:05 henribru

The TypeScript feature allows more than type checking though -- IIRC it allows constructing new (string literal) types from other strings.

I personally don't think a feature to let type checkers use regex matching on literals is all that useful -- I'd rather use Email = NewType(str) and leave the validation to runtime code that the type checker doesn't have to be understand. (It's likely that you already have an email validation routine in your system, and it may not be easy to replicate its exact functionality as a regex.)

gvanrossum avatar May 31 '22 15:05 gvanrossum

For reference, Pydantic (V2) supports & validates such regex-constrained, annotated strings: https://docs.pydantic.dev/2.4/api/types/#pydantic.types.StringConstraints

juanmirocks avatar Oct 16 '23 15:10 juanmirocks

To add another detraction to this, regex in annotations to be checked statically is probably not a great idea for similar reasons that type checkers don't evaluate expressions. Pathological patterns, maliciously added or not, could lead to ReDoS in many regex engines, including the one in python's standard library. This could lead to merely checking out a PR causing analysis to take undue time if not specifically using a more constrained regex engine.

mikeshardmind avatar Oct 16 '23 16:10 mikeshardmind

Yeah I've been thinking of scaling this down quite a bit and following TypeScript's types in string template style.

Fidget-Spinner avatar Oct 16 '23 17:10 Fidget-Spinner