pyright icon indicating copy to clipboard operation
pyright copied to clipboard

IntelliSense is not reactive with sympy

Open denismazzucato opened this issue 1 year ago • 23 comments

Environment data

  • Language Server version: 2023.12.1
  • OS and version: macOS Sonoma 14.2
  • Python version: 3.12.0
  • Conda version 23.10.0
  • Sympy version: 1.12

Code Snippet

from sympy import Symbol

x = Symbol("x")

Expected behavior

Hovering over the the variable x should show that x is a Symbol in no-time, or auto-completion should appear instantly when typing anywhere in the code above.

Actual behavior

It takes about 5 seconds to show the type information when hover with the cursor on x, and about 12 seconds to show any auto-completion.

The problem is using Symbol from sympy as all the other features seems to not run any delay. For example, the code below does not halt IntelliSense:

import sympy as sym

x = sym.var("x")

Logs

pylance-sympy-trace.log

denismazzucato avatar Jan 29 '24 15:01 denismazzucato

Sympy perf was recently mentioned in https://github.com/microsoft/pyright/issues/7130. Eric was looking for someone to provide a concrete example, so I'll transfer this issue over to Pyright.

debonte avatar Jan 29 '24 18:01 debonte

The sympy library claims to be a "typed library" by virtue of the fact that it includes a "py.typed" marker file (as outlined in PEP 561). However, sympy unfortunately contains almost no type information. That means pyright will attempt to infer type information (function return types, etc.) from the code.

You would normally be able to disable this behavior in pyright by setting the useLibraryCodeForTypes option to false, but this setting applies only to libraries that are not marked as "py.typed". When a library claims to be typed, pyright honors that designation and acts accordingly.

I ran pyright --verifytypes sympy to determine the "type completeness" of the library. Most "py.typed" libraries are 50% or more "type complete". Sympy is currently just 6%. It really has no right to claim that it's a typed library.

Symbols exported by "sympy": 35039
  With known type: 2107
  With ambiguous type: 1808
  With unknown type: 31124

Type completeness score: 6%

I'm not sure why sympy maintainers decided to add a "py.typed" marker file when the library is not typed. Perhaps they didn't understand the implications of doing so.

For most libraries, pyright can infer return types relatively quickly. Sympy, however, contains many functions and methods that have complex code flow graphs and large numbers of variables, which makes type inference very costly.

If this library is important to you, I recommend reaching out to the maintainers to ask them to 1) remove the "py.typed" marker, since the library is clearly not typed, or 2) add more type annotations to the library's public interface.

In the meantime, you could manually delete the "py.typed" marker and then set useLibraryCodeForTypes to "false" within your project. This will improve performance but it will also unfortunately eliminate all language server features that rely on type information (including completion suggestions).

You might also be able to find type stub packages for the sympy library. If installed, they would override the sympy library itself. If you can't find an existing type stub package, you could attempt to create a partial set of stubs yourself.

I'm going to close this issue because I don't see anything that we can do in pyright to address this issue.

erictraut avatar Jan 30 '24 02:01 erictraut

Most "py.typed" libraries are 50% or more "type complete". Sympy is currently just 6%. It really has no right to claim that it's a typed library

Fair enough. I think we should remove the py.typed file from sympy then. I merged the PR (see https://github.com/sympy/sympy/pull/22337#issuecomment-950160277) to add the py.typed file to sympy without really understanding what the implications would be. The effect on checking downstream codebases was what worried me although I was more worried about the type checkers misreporting types rather than editors locking up the CPU and consuming many gigabytes of RAM.

I do think that pyright has performance problems highlighted by this. I usually have the pyright language server disabled in my editor because of this but I have just tested what happens if I reenable it and open a file from the sympy codebase. I can see that 1 core is at 100% for many minutes. I haven't waited long enough to see how long it would take for pyright to finish what it is doing. The memory used by node seems to climb up to 2GB, then drop down and climb up again repeatedly. I'm testing this with the py.typed file removed which doesn't seem to make any difference in this case.

I don't see how I could enable a language server that can perform like this just because I opened a file in the editor.

oscarbenjamin avatar Jan 30 '24 10:01 oscarbenjamin

I opened a SymPy PR about a year ago (https://github.com/sympy/sympy/pull/25103) and at the time I said:

Running pyright like this over the codebase takes about 5 minutes on this computer

Something seems to have become less efficient since then though. I just tested with the latest version of pyright (1.1.349) on the same computer and it not only takes longer than 5 minutes but ultimately crashes:

$ time pyright sympy

<--- Last few GCs --->

[21851:0x58bbd80]   814593 ms: Scavenge 1987.2 (2078.7) -> 1979.9 (2078.9) MB, 7.5 / 0.0 ms  (average mu = 0.795, current mu = 0.560) allocation failure; 
[21851:0x58bbd80]   814673 ms: Scavenge 1988.4 (2079.6) -> 1981.1 (2080.9) MB, 4.7 / 0.0 ms  (average mu = 0.795, current mu = 0.560) allocation failure; 
[21851:0x58bbd80]   814712 ms: Scavenge 1988.8 (2080.9) -> 1981.5 (2096.9) MB, 5.4 / 0.0 ms  (average mu = 0.795, current mu = 0.560) allocation failure; 


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb7a940 node::Abort() [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 2: 0xa8e823  [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 5: 0xf3a3e5  [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
11: 0x16fb6b9  [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]

real	13m35.492s
user	19m44.680s
sys	0m33.144s

oscarbenjamin avatar Jan 30 '24 12:01 oscarbenjamin

With pyright 1.1.300 I can run the full check in 7 minutes:

$ time PYRIGHT_PYTHON_FORCE_VERSION=1.1.300 pyright sympy | tee pyright.txt 
No configuration file found.
pyproject.toml file found at /home/oscar/current/active/sympy.
Loading pyproject.toml file at /home/oscar/current/active/sympy/pyproject.toml
Assuming Python version 3.11
Assuming Python platform Linux
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Searching for source files
Found 1517 source files
Emptying type cache to avoid heap overflow. Used 1888MB out of 2096MB.
Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB.
pyright 1.1.300
...
28040 errors, 48 warnings, 0 informations 
Completed in 439.575sec
WARNING: there is a new pyright version available (v1.1.300 -> v1.1.349).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`

I suppose that the messages

Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB.

explain why memory usage goes up to 2GB and drops down again. Maybe this cache emptying is not being handled as well in pyright 1.1.349 which might explain why it crashes after running out of memory.

oscarbenjamin avatar Jan 30 '24 14:01 oscarbenjamin

I did some bisecting to see where sympy became slow. It looks like a change between 1.1.307 and 1.1.308 caused a large perf regression:

I tested with this code (and py.typed removed from the sympy site-packages folder)

import sympy

def f():
    sympy.acos(42)

1.1.307 Results:

pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 2.709sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.15sec
Tokenize:             0.18sec
Parse:                0.32sec
Resolve Imports:      0.28sec
Bind:                 0.33sec
Check:                1.05sec
Detect Cycles:        0sec

1.1.308 Results:

pyright 1.1.308
0 errors, 0 warnings, 0 informations
Completed in 12.676sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.16sec
Tokenize:             0.18sec
Parse:                0.33sec
Resolve Imports:      0.3sec
Bind:                 0.32sec
Check:                11.01sec
Detect Cycles:        0sec

rchiodo avatar Feb 21 '24 18:02 rchiodo

@rchido, were you able to narrow the problem to a specific commit?

erictraut avatar Feb 28 '24 03:02 erictraut

Sorry but I didn't check individual commits. Just the difference between the two releases so far.

rchiodo avatar Feb 28 '24 17:02 rchiodo

It's this one: https://github.com/microsoft/pyright/commit/6753a5d30535ba9a5708ff62494f841519fc54ce

Previous commit timing:

C:\Users\rchiodo\source\testing\Testing_Pyright>node c:\users\rchiodo\source\repos\pyright\packages\pyright\index.js --stats test_sympy.py
Loading configuration file at C:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (2018ms)
[FG] Long operation: analyzing: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (2245ms)
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 2.607sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.05sec
Tokenize:             0.2sec
Parse:                0.3sec
Resolve Imports:      0.28sec
Bind:                 0.32sec
Check:                1.07sec
Detect Cycles:        0sec

6753a5d30535ba9a5708ff62494f841519fc54ce's timing:

C:\Users\rchiodo\source\testing\Testing_Pyright>node c:\users\rchiodo\source\repos\pyright\packages\pyright\index.js --stats test_sympy.py
Loading configuration file at C:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (10976ms)
[FG] Long operation: analyzing: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (11196ms)
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 11.764sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.06sec
Tokenize:             0.2sec
Parse:                0.32sec
Resolve Imports:      0.31sec
Bind:                 0.37sec
Check:                9.9sec
Detect Cycles:        0sec

rchiodo avatar Feb 28 '24 18:02 rchiodo

@erictraut, I've been playing around with this to see what the difference was between those two changes. It seems like the newer change is just a lot more thorough in its evaluation of things. I added some instrumentation and found for the example, we:

  • Call assignTypes 50x more than we do in the previous change
  • Call validateArgs 10x more than we do in the previous change

I think this is because we go farther down the chain of subtypes to compute things. For instance, I put a filter in this function here to skip finding subtypes when the type comes from something in sympy:

https://github.com/microsoft/pyright/blob/0334a44b837cf799c134d4ac6405de3bf26978f6/packages/pyright-internal/src/analyzer/typeEvaluator.ts#L3867

That completely changes the amount of time we spend computing types, actually making it faster than the previous change.

Would that be a viable option? Have some way of making certain packages only explore subtypes to a certain level? (I actually implemented this in Pylance to see how it might work in practice). I don't think there's any obvious ways to fix the perf for every package though. It's really just computing the types for all of the untyped information in sympy that takes so long.

The alternative, I think, would be to generate type stubs for sympy and ship them with Pylance.

rchiodo avatar Mar 13 '24 00:03 rchiodo

I'll submit a PR with the proposed change to see if that's an option or not.

rchiodo avatar Mar 13 '24 16:03 rchiodo

I added a bunch of logging around validateArgType and assignTypes and found this difference:

These are number of times the function is called:

Post change

validateArgType:      3654
assignType:      1807613

Pre change:

validateArgType:      460
assignType:      36298

So then I did some investigation into when they start to differ, At __iterable from typing, the newer change returns isCompatible false the first time validate args is called:

Post change

validateArgType(["__iterable",7,96547) -> {"isCompatible":false,"argType":"ref","isTypeIncomplete":false,"skippedBareTypeVarExpectedType":false}]

Pre change

validateArgType(["__iterable",7,96547])->{"isCompatible":true,"argType":"ref","isTypeIncomplete":false,"skippedBareTypeVarExpectedType":false}]

So I started debugging the difference and it comes to this function call: https://github.com/microsoft/pyright/blob/43a6a8fae55c5b40ee040608271f9420a353d1a5/packages/pyright-internal/src/analyzer/constraintSolver.ts#L935

That's where's a code difference (I mean that makes sense given the diff in output). The older code checks to see if a TypeVar is live or not. The newer code doesn't. That's where the difference comes in.

But given my limited knowledge of what this is doing, that seemed on purpose to me, so not sure if I'm even going down the correct path to figure out why one was slower.

rchiodo avatar Mar 13 '24 20:03 rchiodo

I guess there's a little bit more information that I have. When I added logging for validateArgType, I could see that it was being called over and over for what looked like the same argument? Or at least the same name and type of argument. test.txt

The majority of the output there is for an argument named __iterable.

Here's how I logged that output:

function validateArgType(
        argParam: ValidateArgTypeParams,
        typeVarContext: TypeVarContext,
        signatureTracker: UniqueSignatureTracker,
        typeResult: TypeResult<FunctionType> | undefined,
        options: ValidateArgTypeOptions
    ): ArgResult {
        const result = validateArgTypeImpl(argParam, typeVarContext, signatureTracker, typeResult, options);
        timingStats.addCallstack(argParam.paramName, argParam.paramType.category, argParam.argument.node?.id, result); // Function I added to console log callstacks
        return result;
    }

I was wondering if it would be possible to cache the ArgResult? Maybe that doesn't work or isn't computable.

rchiodo avatar Mar 13 '24 21:03 rchiodo

If you see it being evaluated over and over again, that's because there's a loop present, and some types depend on others. We need to evaluate types until all of them "settle" to their final values. Until then, they are marked as "incomplete". Sympy's code contains many such loops (some of them doubly or triply nested) and many tricky dependencies between many variables whose types are not annotated. Pyright already aggressively caches the already-computed types for all parse nodes, but if the cached value is marked "incomplete", its value must be invalidated when some other node's type is updated within the loop.

These are tricky issues to diagnose. I've spent an hour on this one already, but in the past, similar issues have required many hours of focused time to really understand the root cause and then formulate potential mitigations (or conclude that no such mitigation is required).

erictraut avatar Mar 13 '24 23:03 erictraut

There's a workaround we're going to ship with Pylance. I added stubs to the https://github.com/microsoft/python-type-stubs repository. If you download the https://github.com/microsoft/python-type-stubs/tree/main/stubs/sympy and put that tree into your typings/sympy folder, Pyright will run a LOT faster with the stubs.

Without stubs:

Loading configuration file at c:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
No include entries specified; assuming c:\Users\rchiodo\source\testing\Testing_Pyright
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: file:///c%3A/Users/rchiodo/source/testing/Testing_Pyright/test_sympy.py (15660ms)
[FG] Long operation: analyzing: file:///c%3A/Users/rchiodo/source/testing/Testing_Pyright/test_sympy.py (15956ms)
pyright 1.1.351
0 errors, 0 warnings, 0 informations
Completed in 17.158sec

Analysis stats
Total files parsed and bound: 366
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    1.8sec
Tokenize:             0.35sec
Parse:                0.51sec
Resolve Imports:      0.54sec
Bind:                 0.68sec
Check:                11.97sec
Detect Cycles:        0sec

With stubs:

Loading configuration file at c:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
No include entries specified; assuming c:\Users\rchiodo\source\testing\Testing_Pyright
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
pyright 1.1.351
0 errors, 0 warnings, 0 informations
Completed in 1.047sec

Analysis stats
Total files parsed and bound: 115
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.01sec
Tokenize:             0.07sec
Parse:                0.09sec
Resolve Imports:      0.2sec
Bind:                 0.1sec
Check:                0.06sec
Detect Cycles:        0sec

rchiodo avatar Mar 20 '24 21:03 rchiodo

Thank you all for looking at this.

I don't fully understand the suggestion from @rchiodo. Is it that pyright would add those stub files or that sympy could add them to achieve the attended effect?

There has been some friction around adding type hints within sympy. It is possible that using stub files would be a more workable. I don't understand the implications of there being stub files in sympy as well as type hints in the code or stub files for sympy in pyright as well as type hints in the sympy codebase.

oscarbenjamin avatar Mar 20 '24 22:03 oscarbenjamin

You would add those stub files to your workspace typings folder. That's essentially what Pylance is doing (we have a folder we add to the list of places to search for stubs in).

I believe sympy is working on adding types, but they haven't released in more than a year. These stubs should be a temporary workaround until sympy is fully typed.

rchiodo avatar Mar 20 '24 22:03 rchiodo

You would add those stub files to your workspace typings folder.

To be clear I am a sympy maintainer. I am asking if you are suggesting that we (sympy) should do something and/or to understand what the implications are of whatever pyright is planning to do in relation to what we are doing or might do in future.

oscarbenjamin avatar Mar 20 '24 22:03 oscarbenjamin

To be clear, Pyright isn't doing anything with relation to the stubs. The stubs won't every be as correct as the real sympy code and using them may cause Pyright to not catch errors/show completions/etc.

The stubs are really for Pylance. Pyright users can install them locally too (if perf of sympy is a concern), but the stubs are mostly there to make Pylance go faster.

Longer term, I hope sympy becomes fully typed and we can just delete these extra type stubs. As a maintainer of sympy, it would be great if you could get sympy typed enough that Pyright is fast to analyze sympy. That'd probably be the point where we don't need these stubs anymore.

rchiodo avatar Mar 20 '24 23:03 rchiodo

Sorry I didn't answer this question:

There has been some friction around adding type hints within sympy. It is possible that using stub files would be a more workable. I don't understand the implications of there being stub files in sympy as well as type hints in the code or stub files for sympy in pyright as well as type hints in the sympy codebase.

Sympy can also take ownership of the stubs too. I believe it works if you simply add these pyi files to your source tree. The ones I generated were from the 1.12 release of sympy though. They don't match the dev branch.

rchiodo avatar Mar 20 '24 23:03 rchiodo

The main thing that makes Pyright/Pylance slow is unannotated return values. That's mostly what's in the stubs. Annotating all of the return values in sympy would make Pyright/Pylance a lot faster.

rchiodo avatar Mar 20 '24 23:03 rchiodo

You don't need to annotate every single symbol. You can

  1. annotate all public symbols.
  2. If that is too much, annotate most of the main symbols.
  3. If that is still too much, then you can annotate the core ones that others depend on.

If needed, we could provide a way for you to find (3), so you can strategically add annotations to symbols that matter most.

heejaechang avatar Mar 20 '24 23:03 heejaechang

As a maintainer of sympy, it would be great if you could get sympy typed enough that Pyright is fast to analyze sympy.

I agree but it is really hard to do. Symbolic computing is to some extent inherently weakly typed but also there are many historic design decisions in SymPy that make this harder. The big one is that in SymPy every different kind of symbolic expression node is a different Python class:

In [1]: e1 = sin(x)

In [2]: e2 = e1 + 1

In [3]: e1
Out[3]: sin(x)

In [4]: e2
Out[4]: sin(x) + 1

In [5]: type(e1)
Out[5]: sin

In [6]: type(e2)
Out[6]: sympy.core.add.Add

These types can also return different types from their constructors:

In [7]: e3 = Add(x, x) # We try to construct an Add but we get a Mul

In [8]: e3
Out[8]: 2⋅x

In [9]: type(e3)
Out[9]: sympy.core.mul.Mul

More discussion of this is in https://github.com/sympy/sympy/pull/25103 and https://github.com/python/mypy/issues/15182.

oscarbenjamin avatar Mar 21 '24 09:03 oscarbenjamin

I had a chance to look more deeply at this issue. I've tracked down the problem to a particular expression that spans about 750 lines in the assumptions_generated.py file. It is assigned to a variable named full_implications. I'll look for ways to improve the type evaluation performance for this expression.

erictraut avatar May 21 '24 20:05 erictraut

Just in case everyone is not aware there was a SymPy PR (https://github.com/sympy/sympy/pull/26528) related to this from @heejaechang that got merged. That probably means that there are differences between SymPy 1.12 vs current master relating to this issue.

oscarbenjamin avatar May 21 '24 23:05 oscarbenjamin

This is included in pyright 1.1.365.

erictraut avatar May 29 '24 04:05 erictraut