[ty] Support custom builtins
Summary
Adds the support for custom builtins in ty. They can now be added by placing a __builtins__.pyi file in the project root and defining any necessary builtins.
Right now builtins.pyi is not supported because it's shadowed by Typeshed (Pyright does support both AFAIK), but I think this is not really a problem.
Fixes https://github.com/astral-sh/ty/issues/374
Test Plan
Added mdtests for custom builtins, did some manual tests on our project that makes heavy use of builtins.
Diagnostic diff on typing conformance tests
No changes detected when running ty on typing conformance tests ✅
mypy_primer results
Changes were detected when running on open source projects
tornado (https://github.com/tornadoweb/tornado)
- tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _T@next | _VT@next`
+ tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _VT@next | _T@next`
pydantic (https://github.com/pydantic/pydantic)
- pydantic/fields.py:943:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:943:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:983:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:983:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1026:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1026:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1066:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1066:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1109:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1109:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1148:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1148:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1188:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1188:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1567:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, Divergent], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`
+ pydantic/fields.py:1567:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`
Tanjun (https://github.com/FasterSpeeding/Tanjun)
- tanjun/dependencies/data.py:347:12: error[invalid-return-type] Return type does not match returned value: expected `_T@cached_inject`, found `Coroutine[Any, Any, _T@cached_inject | Coroutine[Any, Any, _T@cached_inject]] | _T@cached_inject`
+ tanjun/dependencies/data.py:347:12: error[invalid-return-type] Return type does not match returned value: expected `_T@cached_inject`, found `_T@cached_inject | Coroutine[Any, Any, _T@cached_inject | Coroutine[Any, Any, _T@cached_inject]]`
xarray (https://github.com/pydata/xarray)
- xarray/core/dataarray.py:5744:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `DataArray | Dataset`
+ xarray/core/dataarray.py:5744:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `T_Xarray@map_blocks | DataArray | Dataset`
- xarray/core/dataset.py:8873:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `DataArray | Dataset`
+ xarray/core/dataset.py:8873:16: error[invalid-return-type] Return type does not match returned value: expected `T_Xarray@map_blocks`, found `T_Xarray@map_blocks | DataArray | Dataset`
scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:98:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 41 diagnostics
+ Found 40 diagnostics
scikit-learn (https://github.com/scikit-learn/scikit-learn)
- sklearn/externals/array_api_extra/_lib/_at.py:300:17: warning[possibly-missing-attribute] Attribute `dtype` may be missing on object of type `int | slice[Any, Any, Any] | EllipsisType | Array | tuple[int | slice[Any, Any, Any] | EllipsisType | Array, ...]`
+ sklearn/externals/array_api_extra/_lib/_at.py:300:17: warning[possibly-missing-attribute] Attribute `dtype` may be missing on object of type `int | Array | tuple[int | slice[Any, Any, Any] | EllipsisType | Array, ...] | slice[Any, Any, Any] | EllipsisType`
- sklearn/externals/array_api_extra/_lib/_at.py:301:17: warning[possibly-missing-attribute] Attribute `shape` may be missing on object of type `int | slice[Any, Any, Any] | EllipsisType | Array | tuple[int | slice[Any, Any, Any] | EllipsisType | Array, ...]`
+ sklearn/externals/array_api_extra/_lib/_at.py:301:17: warning[possibly-missing-attribute] Attribute `shape` may be missing on object of type `int | Array | tuple[int | slice[Any, Any, Any] | EllipsisType | Array, ...] | slice[Any, Any, Any] | EllipsisType`
- sklearn/externals/array_api_extra/_lib/_at.py:308:25: error[invalid-argument-type] Argument to function `apply_where` is incorrect: Expected `Array`, found `int | slice[Any, Any, Any] | EllipsisType | Array | tuple[int | slice[Any, Any, Any] | EllipsisType | Array, ...]`
+ sklearn/externals/array_api_extra/_lib/_at.py:308:25: error[invalid-argument-type] Argument to function `apply_where` is incorrect: Expected `Array`, found `int | Array | tuple[int | slice[Any, Any, Any] | EllipsisType | Array, ...] | slice[Any, Any, Any] | EllipsisType`
static-frame (https://github.com/static-frame/static-frame)
- static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Top[Index[Any]] | Top[Series[Any, Any]] | TypeBlocks | ... omitted 6 union elements, object_]`
- static_frame/core/bus.py:675:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Bus[Any], object_]`, found `InterGetItemILocReduces[Top[Bus[Any]] | TypeBlocks | Batch | ... omitted 6 union elements, generic[object]]`
+ static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Top[Bus[Any]] | TypeBlocks | Batch | ... omitted 6 union elements, object_]`
+ static_frame/core/bus.py:675:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Bus[Any], object_]`, found `InterGetItemILocReduces[Top[Index[Any]] | TypeBlocks | Top[Bus[Any]] | ... omitted 6 union elements, generic[object]]`
- static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[Self@iloc | SeriesHE[Any, Any], generic[object]]`
+ static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[Bottom[Series[Any, Any]] | TypeBlocks | Batch | ... omitted 7 union elements, generic[object]]`
pandas-stubs (https://github.com/pandas-dev/pandas-stubs)
+ pandas-stubs/_typing.pyi:1223:16: warning[unused-ignore-comment] Unused blanket `type: ignore` directive
+ tests/frame/test_groupby.py:228:15: error[type-assertion-failure] Type `Series[Any]` does not match asserted type `Series[str | bytes | int | ... omitted 12 union elements]`
+ tests/frame/test_groupby.py:624:15: error[type-assertion-failure] Type `Series[Any]` does not match asserted type `Series[str | bytes | int | ... omitted 12 union elements]`
- Found 5100 diagnostics
+ Found 5103 diagnostics
No memory usage changes detected ✅
Hm, I'm not sure why colour-science's performance fell down more than twice. Back to the drawing board I guess.
EDIT: nevermind, I checked the report once again and that's not actually the case. phew.
Yep, I'll document this once(/if) it's merged (looks like it's in a different repo so a separate PR is necessary anyway).
Does seem useful for IPython stuff, though currently it would require making a builtins file in the project, which seems slightly hacky.
I like convention over configuration but also wouldn't mind it being an option. The performance is probably comparable between the two because one requires resolving the settings first and the other does an unnecessary resolve_module call
custom-builtins = "__builtins__.pyi"
This would be under a new analysis section.
I like convention over configuration but also wouldn't mind it being an option.
Right, that's why I was curious if there really was a convention here, or if this was currently just a pyright-specific thing. If e.g. pyrefly or mypy also does the same thing as pyright here, I'm happy to go with the crowd. But this is not the design I would choose otherwise
We could establish the convention. That's how conventions happen :)
It just feels slightly easier if all you have to do is: Create a __builtins__.pyi, done compared to create a builtins.pyi, set custom-builtins. It also makes it easier for users migrating from pyright
Edit: pyrefly also supports __builtins__.py https://pyrefly.org/en/docs/migrating-from-pyright/#extending-builtins
Yeah, it does seem that mypy does not support this feature (and it makes sense that Pyrefly does not support it since it only just released recently). I would be happy to swap to any other way to determine where (and whether) the project stores custom builtins if necessary.
I think one issue with the current implementation is that resolve_module resolves __builtins__.py from any search path, including third-party ones. We should definitely change that.
Yeah, that I definitely should look into fixing :D
It still seems very strange to me that, without any explicit configuration to opt into it, we would have very special treatment of a module name that (as far as I know) has no special treatment at runtime by the interpreter. That doesn't seem like intuitive behaviour at all to me; we'll have to make sure we document it well.
But if both pyright and pyrefly already agree on this design, then I'm okay with moving forward here.
Curious if @sharkdp or @carljm have any thoughts
I don't think Pyrefly agrees on this behavior, and quite frankly, from what I've seen in the Pyrefly GitHub this was not brought up to Pyrefly's devs at all (at least publicly). so 🤷🏼♂️ and yeah, when you put it this way it makes much less sense that I did it like this now
Edit: pyrefly also supports
__builtins__.pypyrefly.org/en/docs/migrating-from-pyright#extending-builtins
but (according to those docs), it only supports this feature if you put the __builtins__.pyi file in a directory that you've explicitly listed as an extra search path via pyrefly's site-package-path config setting. Pyrefly's site-package-path config setting is their equivalent of our extra-paths configuration setting, though unlike us (and contradicting the spec?) they give the paths specified by this setting the lowest priority in import resolution.
Giving this very special behaviour to __builtins__.pyi files that we specifically find in an extra-paths search path seems much more defensible to me than implicitly giving this very special behaviour to any __builtins__.pyi file from any search path.
But at that point, you've already said that the user will need some kind of explicit configuration to opt into this behaviour (they need to specify a certain directory in the extra-paths config setting). So it still seems to me that the clearest behaviour would be to have a specific custom-builtins config setting just for opting into this feature, where you can specify the specific file you want ty to treat as a custom builtins stub.
But at that point, you've already said that the user will need some kind of explicit configuration to opt into this behaviour (they need to specify a certain directory in the extra-paths config setting). So it still seems to me that the clearest behaviour would be to have a specific custom-builtins config setting just for opting into this feature, where you can specify the specific file you want ty to treat as a custom builtins stub.
One concern I have with the ability to specify an arbitrary path is that we may fail to resolve imports in that file if it isn't in one of the search paths. This is not an argument against it, it's just something that should be called out in the documentation.
We probably also want to make sure that ty raises typing errors in __builtins__.pyi, and the easiest way to do that is to require the file to be in project.root.
I think the documentation for this option should be quite extensive in general. It's an advanced option that should be treated with extreme care, in my opinion. Python's builtin classes are some of the most important types in Python. Typeshed updates and refines its annotations for these methods on a regular basis, but by using this option you won't benefit from any upstream improvements to these types. There's also a much higher risk of ty crashing if you use a custom builtins file, because nearly all of our tests are written using typeshed's builtins.pyi, and it's very fundamental to type-checking most Python code.
Ah no, sorry, please disregard my last comment -- I misunderstood the feature. This layers additional symbols on top of typeshed's builtins.pyi file, rather than replacing typeshed's builtins.pyi file entirely.
Yeah, this does not replace builtins but rather add to them, replacing entirely sounds crazy since basic types that are likely hardcoded to some extent (i.e. types of literals) can get 'destroyed', so it's for the best this is not done.
From my comment in the ty issue:
I'm thinking that we could probably have builtins.pyi in project root detected by default, and allow overriding the path to the stub in the config. That would mean overriding stubPath would require putting typings.builtins instead of just typings in the config, but also means we're not cornercasing a name that doesn't really mean anything (which alleviates a concern I've heard earlier).
Which is basically the same version proposed in https://github.com/astral-sh/ruff/pull/22021#issuecomment-3679305754 but with a defaut. Though I am unsure whether having the default set to __builtins__ or have no default value is better.
I will implement this some time next week, I am somewhat sick right now so gotta hope I won't take too long
IMO, defaulting to __builtins__.pyi seems fine and we could consider an option that allows you to rename the module name (it would be a module name and not a path). I would be okay not adding the option in this initial version. It's something we can easily add later based on user requests unless we're concerned that users make use of this feature by accident (again, I think that's unlikely, given the very specific module name)
Both pyright and pyrefly support __builtins__.pyi in the project root or in an extra-path; I think that's sufficient to establish a convention, and we should just make life easier for users and follow the same convention. I'm not that worried about special-casing the module name: it's a dunder name, which Python already tells people to expect to have special behavior, and it's not used as a module name by Python itself. It might not be the design I'd have chosen from a blank slate, but it's easy for users, and I don't think it's sufficiently problematic to warrant introducing an incompatibility with other type checkers over.
I also don't think we need to discriminate which search path we found __builtins__.pyi in. Neither pyright nor pyrefly actually do this (despite their documentation arguably implying otherwise). I tested this: if you create a venv and put __builtins__.pyi in its site-packages directory, and type check using that virtualenv, both pyright and pyrefly will discover the __builtins__.pyi and treat it as extra builtins.
Also, both pyright and pyrefly are happy to use it even if it is __builtins__.py, not __builtins__.pyi.
but (according to those docs), it only supports this feature if you put the
__builtins__.pyifile in a directory that you've explicitly listed as an extra search path
No, pyrefly also supports it in the project root, which I expect is the common case. The docs do say that, and I tested and verified it.
I also don't think we should implement a config option to specify the path to the __builtins__.py[i] file. That just seems like extra complexity in our implementation, and extra cognitive load for readers of our docs, to support something that no user has asked for and doesn't have any clear use case.
So basically I think we should implement the simplest-possible (and easiest to implement!) version of this feature, which matches pyright and pyrefly, and not overthink it :)
If Micha and Carl both agree on this design, that's good enough for me :-)
(and sorry for misreading the Pyrefly docs!)
Resolved the comments and rebased to fix a conflict with the recent crate change. I have one additional question about moving the builtins check into module.static_member, described above.
had to force push to satisfy clippy and formatter, so automerge will have to be reenabled, whoops
Very excited this got merged and the ref used in ty. Any chance there will be a new release soon so that ruff and my associated ruff pre commit hooks and also be aware of the file?
Nevermind, it is in place, guess I just need to figure out how to get ruff to use the same file as ty