ty icon indicating copy to clipboard operation
ty copied to clipboard

better gradual guarantee for un-annotated dict (and other container?) literals

Open carljm opened this issue 3 months ago • 6 comments

In the ecosystem, we see this kind of pattern:

def f(a: int, b: str): ...

x = { "a": 1, "b": "2" }

# Expected `int` found `Unknown | int | str`
f(x["a"], x["b"])
# or
f(**x)

In this case, an un-annotated dict literal is used implicitly as a heterogeneous TypedDict. We (along with mypy and pyrefly) throw errors on these calls, because we infer the type of x as dict[str, Unknown | int | str]. (Pyrefly infers dict[str, int | str], mypy dict[str, object].)

Pyright avoids this problem by falling back to dict[str, Unknown] rather than inferring a union value type, when the dict contents look heterogeneous.

A similar problem can occur with list literals (e.g. x = [1, "a"]; f(x[0], x[1])). We do see this in the ecosystem too, but it's less common than with dictionaries (probably because tuples are an attractive alternative, and implicit heterogeneity is supported for tuples).

Perhaps ideally this would be solved by inferring a more precise heterogeneous "implicit TypedDict" type for these literals, but this gets very difficult to handle correctly with mutations.

If we do implement a "gradual mode" vs "strict mode", it may be worth emulating pyright's behavior in the "gradual mode".

carljm avatar Sep 24 '25 16:09 carljm

FWIW, I think we should prioritize this. 2,471 new diagnostics is a massive amount. That averages to 17 new diagnostics per ecosystem project. They are obviously not evenly distributed, but a lot of projects are affected. I also see these errors in some local Python codebases (mypy_primer, ecosystem-analyzer) that previously had very few diagnostics. It may also not be immediately clear to users what is even going on:

state = {"name": "ABC", "counter": 0}

state["counter"] += 1  # Operator `+=` is unsupported between objects of type `str` and `Literal[1]`

print(state["name"].lower())  # Attribute `lower` on type `Unknown | str | int` may be missing

sharkdp avatar Sep 25 '25 08:09 sharkdp

Tentatively putting this in the beta milestone, but it's still not clear to me if we should unconditionally do this, or make it subject to a config parameter that would also toggle the unioning with Unknown that we do.

carljm avatar Sep 25 '25 14:09 carljm

Instead of falling back to Unknown, we could potentially return AnyOf[ValueType1, ValueType2, …]? This is similarly forgiving to Unknown, but would allow us to use auto-completions when typing e.g. state["name"].lo<CURSOR>.

This would require a change to the aggregation logic here, where we currently reduce the completions for a union type to the intersection of the completions on the element types. This logic is correct, but we did already question whether unioning would be the more practically useful way of combining completions when we originally introduced it.

Another way to achieve similar results was suggested by @carljm, where we could track two different types for each expression. The "correct" type and a "best guess" type for LSP use cases. For the strong-gradual-guarantee case, we could return Unknown as the "correct" type, and int | str as the "best guess" type.

sharkdp avatar Sep 29 '25 14:09 sharkdp

Where ty's current behavior definitely differs from mypy, pyrefly is when you explicitly annotate with dict[str, Any] in an attempt to get rid of these errors. Because we always prefer the inferred type (see also #136):

from typing import Any

state: dict[str, Any] = {"name": "ABC", "counter": 0}

state["counter"] += 1  # Still an error!
print(state["name"].lower())  # Still an error!

sharkdp avatar Sep 30 '25 07:09 sharkdp

With the latest changes in https://github.com/astral-sh/ruff/pull/20927, my last comment here is not relevant anymore. So I will move this from Beta to GA. Feel free to move it back if anyone disagrees.

sharkdp avatar Oct 17 '25 07:10 sharkdp

Since I filed this issue, Pyrefly seems to have implemented a version of the "infer implicit TypedDict" approach for dict literals with heterogeneous value types. It prevents some mutations that would change the type of a specific key, but allows others unsoundly:

from typing import reveal_type

def f(a: int, b: str): ...

x = { "a": 1, "b": "2" }  # shows `dict[str, int | str]` in inlay hint

x["b"] = 2  # error, even though the "official" type is dict[str, int | str]
x.update({"b": 2})  # no error, unsound

reveal_type(x)  # dict[str, int | str]
reveal_type(x["a"])  # int
reveal_type(x["b"])  # str

# no errors on either call:

f(x["a"], x["b"])
# or
f(**x)

carljm avatar Dec 18 '25 00:12 carljm