ruff icon indicating copy to clipboard operation
ruff copied to clipboard

[ty] Fix bug in union builder optimization

Open zanieb opened this issue 6 days ago • 9 comments

Just poking around...

Applied to https://github.com/astral-sh/ruff/pull/22071

zanieb avatar Dec 19 '25 14:12 zanieb

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

astral-sh-bot[bot] avatar Dec 19 '25 14:12 astral-sh-bot[bot]

mypy_primer results

Changes were detected when running on open source projects
tornado (https://github.com/tornadoweb/tornado)
- tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _VT@next | _T@next`
+ tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _T@next | _VT@next`

Tanjun (https://github.com/FasterSpeeding/Tanjun)
- tanjun/dependencies/data.py:347:12: error[invalid-return-type] Return type does not match returned value: expected `_T@cached_inject`, found `Coroutine[Any, Any, _T@cached_inject | Coroutine[Any, Any, _T@cached_inject]] | _T@cached_inject`
+ tanjun/dependencies/data.py:347:12: error[invalid-return-type] Return type does not match returned value: expected `_T@cached_inject`, found `_T@cached_inject | Coroutine[Any, Any, _T@cached_inject | Coroutine[Any, Any, _T@cached_inject]]`

xarray (https://github.com/pydata/xarray)
- xarray/core/dataarray.py:5737:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `T_Xarray@map_blocks | DataArray | Dataset`
+ xarray/core/dataarray.py:5737:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `DataArray | Dataset`
- xarray/core/dataset.py:8866:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `T_Xarray@map_blocks | DataArray | Dataset`
+ xarray/core/dataset.py:8866:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `DataArray | Dataset`

jax (https://github.com/google/jax)
+ jax/_src/tree_util.py:295:31: error[invalid-argument-type] Argument to bound method `register_node` is incorrect: Expected `(Hashable, Iterable[object], /) -> T@register_pytree_node`, found `(_AuxData@register_pytree_node, _Children@register_pytree_node, /) -> T@register_pytree_node`
+ jax/_src/tree_util.py:298:31: error[invalid-argument-type] Argument to bound method `register_node` is incorrect: Expected `(Hashable, Iterable[object], /) -> T@register_pytree_node`, found `(_AuxData@register_pytree_node, _Children@register_pytree_node, /) -> T@register_pytree_node`
+ jax/_src/tree_util.py:301:31: error[invalid-argument-type] Argument to bound method `register_node` is incorrect: Expected `(Hashable, Iterable[object], /) -> T@register_pytree_node`, found `(_AuxData@register_pytree_node, _Children@register_pytree_node, /) -> T@register_pytree_node`
- Found 2799 diagnostics
+ Found 2802 diagnostics

static-frame (https://github.com/static-frame/static-frame)
- static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Bus[Any] | Top[Index[Any]] | Top[Series[Any, Any]] | ... omitted 7 union elements, object_]`
+ static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Bus[Any] | TypeBlocks | Batch | ... omitted 7 union elements, object_]`
- static_frame/core/series.py:772:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Series[Any, Any], TVDtype@Series]`, found `InterGetItemILocReduces[Series[Any, Any] | TypeBlocks | Batch | ... omitted 6 union elements, generic[object]]`
+ static_frame/core/series.py:772:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Series[Any, Any], TVDtype@Series]`, found `InterGetItemILocReduces[Series[Any, Any] | Top[Index[Any]] | TypeBlocks | ... omitted 6 union elements, generic[object]]`
- static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[SeriesHE[Any, Any] | TypeBlocks | Batch | ... omitted 7 union elements, generic[object]]`
+ static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[SeriesHE[Any, Any] | Top[Index[Any]] | TypeBlocks | ... omitted 7 union elements, generic[object]]`


No memory usage changes detected ✅

astral-sh-bot[bot] avatar Dec 19 '25 14:12 astral-sh-bot[bot]

This looks wrong, I'll keep poking in the background.

zanieb avatar Dec 19 '25 15:12 zanieb

It looks "correct" in the sense that it fixes the bug in my PR

MichaReiser avatar Dec 19 '25 15:12 MichaReiser

Oh does it? It looked like it caused some new problems too :)

zanieb avatar Dec 19 '25 15:12 zanieb

Oh does it? It looked like it caused some new problems too :)

It does. So I think you might be on to something but I honestly don't know enough about those union builder flags (or when we should set which flag)

MichaReiser avatar Dec 19 '25 15:12 MichaReiser

UnionBuilder::cycle_recovery is a flag required due to the technical constraint that query cycles cannot occur within cycle recovery functions. When this is true, a cheap union operation is performed without using is_redundant_with. This is fine for use within cycle normalization.

UnionBuilder::recursively_defined was introduced in #21683. Setting it to Yes will perform widening earlier, which will result in faster fixed-point convergence. The rule for recursively_defined is that this attribute is infectious, i.e. if you find a sub-union with recursively_defined = Yes, you should also mark the final union type as recursively_defined = Yes. If you are seeing a too-many-cycle panic due to unbounded growth of the literal type, you probably have recursively_defined set to No when it should be Yes.

There is no direct relationship between the two. recursively_defined = Yes should be annotated for union types of recursively defined expressions. Specifically, expressions with a type like T | Divergent are considered recursively defined (ref. https://github.com/astral-sh/ruff/blob/9809405b053280e3f0e47bb5c762108e70a15519/crates/ty_python_semantic/src/types.rs#L14075-L14080).

mtshiba avatar Dec 19 '25 15:12 mtshiba

@MichaReiser CI is green now 🤔

zanieb avatar Dec 19 '25 15:12 zanieb

@MichaReiser CI is green now 🤔

Unfortunately, it doesn't remove the prefect diagnostics anymore.

MichaReiser avatar Dec 19 '25 16:12 MichaReiser