jedi Completions fail with KeyError in parser_cache if fast

The autocompletion jedi.Script#complete fails with because the parso.cache.parser_cache is empty when setting jedi.settings.fast_parser = False.

Try this script to reproduce the issue:

import jedi

jedi.settings.fast_parser = False

script = jedi.Script(
    code="im", # Just starting a new file. Trying to get 'import' completed
    path="", # Using "" for unsaved files as described in the docstring
)

# Complete at the end of the code
completions = script.complete() 
print(completions)

Traceback:

Traceback (most recent call last):
  File "/home/benjamin/misc/jedi/./jedi-18-1-bug.py", line 12, in <module>
    completions = script.complete() 
  File "/home/benjamin/misc/jedi/jedi/api/helpers.py", line 487, in wrapper
    return func(self, line, column, *args, **kwargs)
  File "/home/benjamin/misc/jedi/jedi/api/__init__.py", line 214, in complete
    return completion.complete()
  File "/home/benjamin/misc/jedi/jedi/api/completion.py", line 170, in complete
    cached_name, completion_names = self._complete_python(leaf)
  File "/home/benjamin/misc/jedi/jedi/api/completion.py", line 314, in _complete_python
    completion_names += self._complete_global_scope()
  File "/home/benjamin/misc/jedi/jedi/api/completion.py", line 379, in _complete_global_scope
    for filter in filters:
  File "/home/benjamin/misc/jedi/jedi/inference/context.py", line 486, in get_global_filters
    yield from context.get_filters(
  File "/home/benjamin/misc/jedi/jedi/inference/context.py", line 318, in get_filters
    next(filters, None)
  File "/home/benjamin/misc/jedi/jedi/inference/value/module.py", line 63, in get_filters
    ParserTreeFilter(
  File "/home/benjamin/misc/jedi/jedi/inference/filters.py", line 138, in __init__
    super().__init__(parent_context, node_context)
  File "/home/benjamin/misc/jedi/jedi/inference/filters.py", line 100, in __init__
    self._parso_cache_node = get_parso_cache_node(
  File "/home/benjamin/misc/jedi/jedi/parser_utils.py", line 287, in get_parso_cache_node
    return parser_cache[grammar._hashed][path]
KeyError: '715ad56c5f4f8395092c58b6b6f2deb4f906f81380929a836bd86ab253634875'

Using git bisect I found that the first commit with this bug is https://github.com/davidhalter/jedi/commit/b9067ccdbbd308e96c5384765d1743f2da062d0e.

Oct 19 '22 14:10 HedgehogCode

This might also be related to https://github.com/davidhalter/jedi/commit/db0e90763be0f65de1a03a270f10272b46184892 and the comment in this commit. The issue didn't happen when I used None for the path parameter of jedi.Script. However, I am not sure if this is the solution or just a workaround.

Oct 19 '22 14:10 HedgehogCode

Using None there will trigger other issues. This looks like a bug.

Oct 20 '22 18:10 davidhalter

Hi, is there any progress on this?

I'm using parso to compare and analyze different versions of code, so I need to either turn off fast_parser or deepcopy the entire parso tree. Due to this bug, now I have to use deepcopy but it's about 3 times slower than just turning off fast_parser. Is there any simple fix to this cache error (or a faster alternative to using deepcopy?) Thanks!

Jan 19 '23 01:01 MrVPlusOne

Hi there! We hit this issue on an internal code analysis project that uses Jedi internally. We worked around it by patching jedi/inference/filters.py to cleanly handle the KeyError, but this feels like a workaround that doesn't really solve the underlying issue.

@davidhalter If you want, I can submit a PR with our patch, but I'm also willing to implement a more correct fix if you have a suggestion, so it can be upstreamed?

Feb 27 '24 15:02 mithaler

@mithaler I'm interested to find out what you're using Jedi for internally. You can also email me if you don't want to say it in public.

I understand that your patch fixes this issue, but at this point nobody understands why this happens (I don't either). So we somehow would need to understand why this happens, because if we don't we risk triggering a ton of other bugs.

Feb 27 '24 15:02 davidhalter

I'm interested to find out what you're using Jedi for internally.

Sure, I can sort of answer this: we want to detangle a large monorepo, so we're generating dependency trees at a function level. This requires us to recursively look up definitions of functions, so we're using a language server, in this case Jedi's.

I understand that your patch fixes this issue, but at this point nobody understands why this happens (I don't either). So we somehow would need to understand why this happens, because if we don't we risk triggering a ton of other bugs.

While I can't provide a test case here (it would involve revealing the internal source code we're analyzing), I can say that this only happens when we delve deeply into the dependency tree, and it happens unpredictably (that is, I see errors on different parts of the tree on different runs on the same code). I'm not entirely sure how Parso's cache works, but I suspect weak references being GC'ed are a possible culprit?

Feb 27 '24 16:02 mithaler

@mithaler If you're using a language server, are you sure fast_parser is False? That's not how almost everyone uses Jedi. The original issue is not very surprising to me, because almost nobody uses that code path AFAIK, so I'm not surprised there are bugs around caching (because it's not really well tested and likely an easy fixable bug once someone reads a bit of code).

Feb 27 '24 23:02 davidhalter

I'm not sure of that -- I just know that the stack trace we see is very similar (though not identical), and my commit, linked above, appears to fix it.

Feb 28 '24 16:02 mithaler

@davidhalter Regarding what @mithaler mentions here ie I can say that this only happens when we delve deeply into the dependency tree, and it happens unpredictably (that is, I see errors on different parts of the tree on different runs on the same code).., I also notice the exactly same thing.

@davidhalter Can you explain the purpose/implications of setting fast_parser as True or False ? At the moment, I care less about speed and more about correctness, so in that case, would it be better to set fast_parser to False ?

Mar 07 '24 03:03 anmolagarwal999

Setting fast_parser to False was mostly used by people that had issues with the incremental parsing of parso. This used to be a bit buggier, but now that I fuzzed it a lot it's very stable and has no known bugs. So disabling it is not interesting regarding correctness. It's just interesting if you wanted to get rid of exceptions, which is not really the case anymore.

But it's actually a good question, because the original poster might just use the fast_parser default. It's of course still a bug.

Mar 07 '24 08:03 davidhalter

jedi
jedi copied to clipboard

Completions fail with KeyError in parser_cache if fast_parser is False

jedi jedi copied to clipboard

Completions fail with KeyError in parser_cache if fast_parser is False

jedi
jedi copied to clipboard