pyright
pyright copied to clipboard
IntelliSense is not reactive with sympy
Environment data
- Language Server version: 2023.12.1
- OS and version: macOS Sonoma 14.2
- Python version: 3.12.0
- Conda version 23.10.0
- Sympy version: 1.12
Code Snippet
from sympy import Symbol
x = Symbol("x")
Expected behavior
Hovering over the the variable x should show that x is a Symbol in no-time, or auto-completion should appear instantly when typing anywhere in the code above.
Actual behavior
It takes about 5 seconds to show the type information when hover with the cursor on x, and about 12 seconds to show any auto-completion.
The problem is using Symbol from sympy as all the other features seems to not run any delay. For example, the code below does not halt IntelliSense:
import sympy as sym
x = sym.var("x")
Logs
Sympy perf was recently mentioned in https://github.com/microsoft/pyright/issues/7130. Eric was looking for someone to provide a concrete example, so I'll transfer this issue over to Pyright.
The sympy
library claims to be a "typed library" by virtue of the fact that it includes a "py.typed" marker file (as outlined in PEP 561). However, sympy unfortunately contains almost no type information. That means pyright will attempt to infer type information (function return types, etc.) from the code.
You would normally be able to disable this behavior in pyright by setting the useLibraryCodeForTypes
option to false, but this setting applies only to libraries that are not marked as "py.typed". When a library claims to be typed, pyright honors that designation and acts accordingly.
I ran pyright --verifytypes sympy
to determine the "type completeness" of the library. Most "py.typed" libraries are 50% or more "type complete". Sympy is currently just 6%. It really has no right to claim that it's a typed library.
Symbols exported by "sympy": 35039
With known type: 2107
With ambiguous type: 1808
With unknown type: 31124
Type completeness score: 6%
I'm not sure why sympy maintainers decided to add a "py.typed" marker file when the library is not typed. Perhaps they didn't understand the implications of doing so.
For most libraries, pyright can infer return types relatively quickly. Sympy, however, contains many functions and methods that have complex code flow graphs and large numbers of variables, which makes type inference very costly.
If this library is important to you, I recommend reaching out to the maintainers to ask them to 1) remove the "py.typed" marker, since the library is clearly not typed, or 2) add more type annotations to the library's public interface.
In the meantime, you could manually delete the "py.typed" marker and then set useLibraryCodeForTypes
to "false" within your project. This will improve performance but it will also unfortunately eliminate all language server features that rely on type information (including completion suggestions).
You might also be able to find type stub packages for the sympy library. If installed, they would override the sympy library itself. If you can't find an existing type stub package, you could attempt to create a partial set of stubs yourself.
I'm going to close this issue because I don't see anything that we can do in pyright to address this issue.
Most "py.typed" libraries are 50% or more "type complete". Sympy is currently just 6%. It really has no right to claim that it's a typed library
Fair enough. I think we should remove the py.typed file from sympy then. I merged the PR (see https://github.com/sympy/sympy/pull/22337#issuecomment-950160277) to add the py.typed file to sympy without really understanding what the implications would be. The effect on checking downstream codebases was what worried me although I was more worried about the type checkers misreporting types rather than editors locking up the CPU and consuming many gigabytes of RAM.
I do think that pyright has performance problems highlighted by this. I usually have the pyright language server disabled in my editor because of this but I have just tested what happens if I reenable it and open a file from the sympy codebase. I can see that 1 core is at 100% for many minutes. I haven't waited long enough to see how long it would take for pyright to finish what it is doing. The memory used by node seems to climb up to 2GB, then drop down and climb up again repeatedly. I'm testing this with the py.typed file removed which doesn't seem to make any difference in this case.
I don't see how I could enable a language server that can perform like this just because I opened a file in the editor.
I opened a SymPy PR about a year ago (https://github.com/sympy/sympy/pull/25103) and at the time I said:
Running pyright like this over the codebase takes about 5 minutes on this computer
Something seems to have become less efficient since then though. I just tested with the latest version of pyright (1.1.349) on the same computer and it not only takes longer than 5 minutes but ultimately crashes:
$ time pyright sympy
<--- Last few GCs --->
[21851:0x58bbd80] 814593 ms: Scavenge 1987.2 (2078.7) -> 1979.9 (2078.9) MB, 7.5 / 0.0 ms (average mu = 0.795, current mu = 0.560) allocation failure;
[21851:0x58bbd80] 814673 ms: Scavenge 1988.4 (2079.6) -> 1981.1 (2080.9) MB, 4.7 / 0.0 ms (average mu = 0.795, current mu = 0.560) allocation failure;
[21851:0x58bbd80] 814712 ms: Scavenge 1988.8 (2080.9) -> 1981.5 (2096.9) MB, 5.4 / 0.0 ms (average mu = 0.795, current mu = 0.560) allocation failure;
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0xb7a940 node::Abort() [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
2: 0xa8e823 [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
5: 0xf3a3e5 [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
11: 0x16fb6b9 [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
real 13m35.492s
user 19m44.680s
sys 0m33.144s
With pyright 1.1.300 I can run the full check in 7 minutes:
$ time PYRIGHT_PYTHON_FORCE_VERSION=1.1.300 pyright sympy | tee pyright.txt
No configuration file found.
pyproject.toml file found at /home/oscar/current/active/sympy.
Loading pyproject.toml file at /home/oscar/current/active/sympy/pyproject.toml
Assuming Python version 3.11
Assuming Python platform Linux
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Searching for source files
Found 1517 source files
Emptying type cache to avoid heap overflow. Used 1888MB out of 2096MB.
Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB.
pyright 1.1.300
...
28040 errors, 48 warnings, 0 informations
Completed in 439.575sec
WARNING: there is a new pyright version available (v1.1.300 -> v1.1.349).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`
I suppose that the messages
Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB.
explain why memory usage goes up to 2GB and drops down again. Maybe this cache emptying is not being handled as well in pyright 1.1.349 which might explain why it crashes after running out of memory.
I did some bisecting to see where sympy became slow. It looks like a change between 1.1.307 and 1.1.308 caused a large perf regression:
I tested with this code (and py.typed removed from the sympy site-packages folder)
import sympy
def f():
sympy.acos(42)
1.1.307 Results:
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 2.709sec
Analysis stats
Total files parsed and bound: 223
Total files checked: 1
Timing stats
Find Source Files: 0sec
Read Source Files: 0.15sec
Tokenize: 0.18sec
Parse: 0.32sec
Resolve Imports: 0.28sec
Bind: 0.33sec
Check: 1.05sec
Detect Cycles: 0sec
1.1.308 Results:
pyright 1.1.308
0 errors, 0 warnings, 0 informations
Completed in 12.676sec
Analysis stats
Total files parsed and bound: 223
Total files checked: 1
Timing stats
Find Source Files: 0sec
Read Source Files: 0.16sec
Tokenize: 0.18sec
Parse: 0.33sec
Resolve Imports: 0.3sec
Bind: 0.32sec
Check: 11.01sec
Detect Cycles: 0sec
@rchido, were you able to narrow the problem to a specific commit?
Sorry but I didn't check individual commits. Just the difference between the two releases so far.
It's this one: https://github.com/microsoft/pyright/commit/6753a5d30535ba9a5708ff62494f841519fc54ce
Previous commit timing:
C:\Users\rchiodo\source\testing\Testing_Pyright>node c:\users\rchiodo\source\repos\pyright\packages\pyright\index.js --stats test_sympy.py
Loading configuration file at C:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (2018ms)
[FG] Long operation: analyzing: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (2245ms)
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 2.607sec
Analysis stats
Total files parsed and bound: 223
Total files checked: 1
Timing stats
Find Source Files: 0sec
Read Source Files: 0.05sec
Tokenize: 0.2sec
Parse: 0.3sec
Resolve Imports: 0.28sec
Bind: 0.32sec
Check: 1.07sec
Detect Cycles: 0sec
6753a5d30535ba9a5708ff62494f841519fc54ce's timing:
C:\Users\rchiodo\source\testing\Testing_Pyright>node c:\users\rchiodo\source\repos\pyright\packages\pyright\index.js --stats test_sympy.py
Loading configuration file at C:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (10976ms)
[FG] Long operation: analyzing: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (11196ms)
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 11.764sec
Analysis stats
Total files parsed and bound: 223
Total files checked: 1
Timing stats
Find Source Files: 0sec
Read Source Files: 0.06sec
Tokenize: 0.2sec
Parse: 0.32sec
Resolve Imports: 0.31sec
Bind: 0.37sec
Check: 9.9sec
Detect Cycles: 0sec
@erictraut, I've been playing around with this to see what the difference was between those two changes. It seems like the newer change is just a lot more thorough in its evaluation of things. I added some instrumentation and found for the example, we:
- Call assignTypes 50x more than we do in the previous change
- Call validateArgs 10x more than we do in the previous change
I think this is because we go farther down the chain of subtypes to compute things. For instance, I put a filter in this function here to skip finding subtypes when the type comes from something in sympy
:
https://github.com/microsoft/pyright/blob/0334a44b837cf799c134d4ac6405de3bf26978f6/packages/pyright-internal/src/analyzer/typeEvaluator.ts#L3867
That completely changes the amount of time we spend computing types, actually making it faster than the previous change.
Would that be a viable option? Have some way of making certain packages only explore subtypes to a certain level? (I actually implemented this in Pylance to see how it might work in practice). I don't think there's any obvious ways to fix the perf for every package though. It's really just computing the types for all of the untyped information in sympy that takes so long.
The alternative, I think, would be to generate type stubs for sympy and ship them with Pylance.
I'll submit a PR with the proposed change to see if that's an option or not.
I added a bunch of logging around validateArgType
and assignTypes
and found this difference:
These are number of times the function is called:
Post change
validateArgType: 3654
assignType: 1807613
Pre change:
validateArgType: 460
assignType: 36298
So then I did some investigation into when they start to differ, At __iterable
from typing, the newer change returns isCompatible false the first time validate args is called:
Post change
validateArgType(["__iterable",7,96547) -> {"isCompatible":false,"argType":"ref","isTypeIncomplete":false,"skippedBareTypeVarExpectedType":false}]
Pre change
validateArgType(["__iterable",7,96547])->{"isCompatible":true,"argType":"ref","isTypeIncomplete":false,"skippedBareTypeVarExpectedType":false}]
So I started debugging the difference and it comes to this function call: https://github.com/microsoft/pyright/blob/43a6a8fae55c5b40ee040608271f9420a353d1a5/packages/pyright-internal/src/analyzer/constraintSolver.ts#L935
That's where's a code difference (I mean that makes sense given the diff in output). The older code checks to see if a TypeVar is live or not. The newer code doesn't. That's where the difference comes in.
But given my limited knowledge of what this is doing, that seemed on purpose to me, so not sure if I'm even going down the correct path to figure out why one was slower.
I guess there's a little bit more information that I have. When I added logging for validateArgType
, I could see that it was being called over and over for what looked like the same argument? Or at least the same name and type of argument.
test.txt
The majority of the output there is for an argument named __iterable
.
Here's how I logged that output:
function validateArgType(
argParam: ValidateArgTypeParams,
typeVarContext: TypeVarContext,
signatureTracker: UniqueSignatureTracker,
typeResult: TypeResult<FunctionType> | undefined,
options: ValidateArgTypeOptions
): ArgResult {
const result = validateArgTypeImpl(argParam, typeVarContext, signatureTracker, typeResult, options);
timingStats.addCallstack(argParam.paramName, argParam.paramType.category, argParam.argument.node?.id, result); // Function I added to console log callstacks
return result;
}
I was wondering if it would be possible to cache the ArgResult? Maybe that doesn't work or isn't computable.
If you see it being evaluated over and over again, that's because there's a loop present, and some types depend on others. We need to evaluate types until all of them "settle" to their final values. Until then, they are marked as "incomplete". Sympy's code contains many such loops (some of them doubly or triply nested) and many tricky dependencies between many variables whose types are not annotated. Pyright already aggressively caches the already-computed types for all parse nodes, but if the cached value is marked "incomplete", its value must be invalidated when some other node's type is updated within the loop.
These are tricky issues to diagnose. I've spent an hour on this one already, but in the past, similar issues have required many hours of focused time to really understand the root cause and then formulate potential mitigations (or conclude that no such mitigation is required).
There's a workaround we're going to ship with Pylance. I added stubs to the https://github.com/microsoft/python-type-stubs repository. If you download the https://github.com/microsoft/python-type-stubs/tree/main/stubs/sympy and put that tree into your typings/sympy
folder, Pyright will run a LOT faster with the stubs.
Without stubs:
Loading configuration file at c:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
No include entries specified; assuming c:\Users\rchiodo\source\testing\Testing_Pyright
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: file:///c%3A/Users/rchiodo/source/testing/Testing_Pyright/test_sympy.py (15660ms)
[FG] Long operation: analyzing: file:///c%3A/Users/rchiodo/source/testing/Testing_Pyright/test_sympy.py (15956ms)
pyright 1.1.351
0 errors, 0 warnings, 0 informations
Completed in 17.158sec
Analysis stats
Total files parsed and bound: 366
Total files checked: 1
Timing stats
Find Source Files: 0sec
Read Source Files: 1.8sec
Tokenize: 0.35sec
Parse: 0.51sec
Resolve Imports: 0.54sec
Bind: 0.68sec
Check: 11.97sec
Detect Cycles: 0sec
With stubs:
Loading configuration file at c:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
No include entries specified; assuming c:\Users\rchiodo\source\testing\Testing_Pyright
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
pyright 1.1.351
0 errors, 0 warnings, 0 informations
Completed in 1.047sec
Analysis stats
Total files parsed and bound: 115
Total files checked: 1
Timing stats
Find Source Files: 0sec
Read Source Files: 0.01sec
Tokenize: 0.07sec
Parse: 0.09sec
Resolve Imports: 0.2sec
Bind: 0.1sec
Check: 0.06sec
Detect Cycles: 0sec
Thank you all for looking at this.
I don't fully understand the suggestion from @rchiodo. Is it that pyright would add those stub files or that sympy could add them to achieve the attended effect?
There has been some friction around adding type hints within sympy. It is possible that using stub files would be a more workable. I don't understand the implications of there being stub files in sympy as well as type hints in the code or stub files for sympy in pyright as well as type hints in the sympy codebase.
You would add those stub files to your workspace typings folder. That's essentially what Pylance is doing (we have a folder we add to the list of places to search for stubs in).
I believe sympy is working on adding types, but they haven't released in more than a year. These stubs should be a temporary workaround until sympy is fully typed.
You would add those stub files to your workspace typings folder.
To be clear I am a sympy maintainer. I am asking if you are suggesting that we (sympy) should do something and/or to understand what the implications are of whatever pyright is planning to do in relation to what we are doing or might do in future.
To be clear, Pyright isn't doing anything with relation to the stubs. The stubs won't every be as correct
as the real sympy code and using them may cause Pyright to not catch errors/show completions/etc.
The stubs are really for Pylance. Pyright users can install them locally too (if perf of sympy is a concern), but the stubs are mostly there to make Pylance go faster.
Longer term, I hope sympy becomes fully typed and we can just delete these extra type stubs. As a maintainer of sympy, it would be great if you could get sympy typed enough that Pyright is fast to analyze sympy. That'd probably be the point where we don't need these stubs anymore.
Sorry I didn't answer this question:
There has been some friction around adding type hints within sympy. It is possible that using stub files would be a more workable. I don't understand the implications of there being stub files in sympy as well as type hints in the code or stub files for sympy in pyright as well as type hints in the sympy codebase.
Sympy can also take ownership of the stubs too. I believe it works if you simply add these pyi files to your source tree. The ones I generated were from the 1.12 release of sympy though. They don't match the dev branch.
The main thing that makes Pyright/Pylance slow is unannotated return values. That's mostly what's in the stubs. Annotating all of the return values in sympy would make Pyright/Pylance a lot faster.
You don't need to annotate every single symbol. You can
- annotate all public symbols.
- If that is too much, annotate most of the main symbols.
- If that is still too much, then you can annotate the core ones that others depend on.
If needed, we could provide a way for you to find (3), so you can strategically add annotations to symbols that matter most.
As a maintainer of sympy, it would be great if you could get sympy typed enough that Pyright is fast to analyze sympy.
I agree but it is really hard to do. Symbolic computing is to some extent inherently weakly typed but also there are many historic design decisions in SymPy that make this harder. The big one is that in SymPy every different kind of symbolic expression node is a different Python class:
In [1]: e1 = sin(x)
In [2]: e2 = e1 + 1
In [3]: e1
Out[3]: sin(x)
In [4]: e2
Out[4]: sin(x) + 1
In [5]: type(e1)
Out[5]: sin
In [6]: type(e2)
Out[6]: sympy.core.add.Add
These types can also return different types from their constructors:
In [7]: e3 = Add(x, x) # We try to construct an Add but we get a Mul
In [8]: e3
Out[8]: 2⋅x
In [9]: type(e3)
Out[9]: sympy.core.mul.Mul
More discussion of this is in https://github.com/sympy/sympy/pull/25103 and https://github.com/python/mypy/issues/15182.
I had a chance to look more deeply at this issue. I've tracked down the problem to a particular expression that spans about 750 lines in the assumptions_generated.py
file. It is assigned to a variable named full_implications
. I'll look for ways to improve the type evaluation performance for this expression.
Just in case everyone is not aware there was a SymPy PR (https://github.com/sympy/sympy/pull/26528) related to this from @heejaechang that got merged. That probably means that there are differences between SymPy 1.12 vs current master relating to this issue.
This is included in pyright 1.1.365.