rope profile auto-completion

By @JulianEberius The main changes I made to Rope concern auto-completion. It is very, very slow on large code bases (just try Rope’s completions on Rope’s code itself). I used Python’s profile module to track where the time is spent on a single completion request, and it was mainly file system related stuff that is called a lot of times for a single request. I added a lot of caching to my branch of Rope, but I didn’t fully check whether I introduced subtle bugs or changed semantics this way. It improved latency by several orders of magnitude, which was important as Sublime Text does „as-you-type“ autocompletion, so the operation needs to be performed in milliseconds (on almost every keystroke) even on large code bases.

Please see my commit on SublimePythonIDE for details, if you’re interested on tackling this problem (https://github.com/JulianEberius/SublimePythonIDE/commit/85c490cd64). Sadly the commit intermingles the caching with a second, minor, issue: case-insensitive completions. Some people prefer to just type, and not to worry about lower/upper case if the completion engine can return the right thing anyway. As I said, this is minor, compared to the performance issues.

To tackle the problems systematically, it’s probably best if you to some profiling of your own, I just used cProfile and gprof2dot to make graphs… If the completion performance is improved I would love to just add Rope as a submodule, instead of maintaining my own fork.

Dec 22 '13 00:12 mcepl

Interesting ... patch just with -p3 cleanly applies and doesn't break any tests ... https://travis-ci.org/mcepl/rope/builds/15833656 So, now we need to split speed-up improvements from the case insensitive matching (which should be a separate functionality and probably optional). Also, http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script and http://stefaanlippens.net/python_profiling_with_pstats_interactive_mode look like great resources on Python profiling.

Dec 22 '13 00:12 mcepl

In general, I think supporting case insensitive completions is a good idea. On the other hand, rope's performance problems requires a more detailed examination to understand why the existing caching mechanisms fail. When does rope's performance issues become noticeable? That should depend on the number of files in the project, I guess.

The main known performance issue in rope, besides the automatic Static Object Analysis which I suggested disabling in rope mainline in the mailing list a few times before, is "from largemodule import *". When concluded data (more details below) get invalidated, these names are dropped and are recalculated again when needed. I wonder if the slowness you observe is due to the same reason.

My comments follow:

diff --git a/lib/python2/rope/base/project.py b/lib/python2/rope/base/project.py
index 97d2dd3..c6f2b59 100644
--- a/lib/python2/rope/base/project.py
+++ b/lib/python2/rope/base/project.py
@@ -18,6 +18,7 @@ class _Project(object):
         self.prefs = prefs.Prefs()
         self.data_files = _DataFiles(self)
-    @utils.memoize
   def get_resource(self, resource_name):
       """Get a resource in a project.

Constructing File and Folder objects seems rather cheap; does this really help?

@@ -360,7 +361,7 @@ class _DataFiles(object):
             path += '.gz'
         return self.project.get_file(path)

## 

[email protected](1000)
 def _realpath(path):
     """Return the real path of `path`

I am worried that the overhead of searching the previous return values could be actually greater than the overhead of python's realpath/abspath/expanduser. Do you remember which of these three functions is the slowest? Can't we optimize that?

diff --git a/lib/python2/rope/base/pycore.py b/lib/python2/rope/base/pycore.py
index 32056a0..3aff2ff 100644
--- a/lib/python2/rope/base/pycore.py
+++ b/lib/python2/rope/base/pycore.py
@@ -183,6 +183,7 @@ class PyCore(object):
     #    packages, that is most of the time
     #  - We need a separate resource observer; `self.observer`
     #    does not get notified about module and folder creations
-    @utils.memoize
   def get_source_folders(self):
       """Returns project source folders"""
       if self.project.root is None:

The main problem with saving the return value of this function is that it would become invalid after creating folders in the project.

-    @utils.memoize
   def _find_source_folders(self, folder):
       for resource in folder.get_folders():
           if self._is_package(resource):

This does not seem necessary and adding the decorator for get_source_folders() should be adequate.

diff --git a/lib/python2/rope/base/pynames.py b/lib/python2/rope/base/pynames.py
index 79bba15..82ee2cc 100644
--- a/lib/python2/rope/base/pynames.py
+++ b/lib/python2/rope/base/pynames.py
@@ -101,6 +101,7 @@ class ImportedModule(PyName):
         self.level = level
         self.resource = resource
         self.pymodule = _get_concluded_data(self.importing_module)
-        self.cached_pyobject = None

Actually self.pymodule does the caching itself; it uses _ConcludedData objects. These objects should be reused unless a change to the project (like modifying a module) possibly makes the data invalid; for instance, if a module is changed, imports in other modules may become invalid. pycore.py manages such changes using project observers. The logic for managing changes in pycore.py was much more involved once, but was simplified because no significant improvement was observed then.

So I think it is important to find out if ImportedModule._get_module() calls resource_to_pyobjects() unnecessarily and, if so, why?

diff --git a/lib/python2/rope/base/utils.py b/lib/python2/rope/base/utils.py
index e35ecbf..4e433b8 100644
--- a/lib/python2/rope/base/utils.py
+++ b/lib/python2/rope/base/utils.py
@@ -76,3 +75,48 @@ class _Cached(object):
         if len(self.cache) > self.count:
             del self.cache[0]
         return result
+
+class memoize(object):
-    """cache the return value of a method

This makes me worried for two reasons:

Some return values become invalid and should be dropped; that is the reason for the existence of project.observers and concluded_data.
There is no limit on the saved objects.

diff --git a/lib/python2/rope/contrib/codeassist.py b/lib/python2/rope/contrib/codeassist.py
index 37433c2..4b02c89 100644
--- a/lib/python2/rope/contrib/codeassist.py
+++ b/lib/python2/rope/contrib/codeassist.py
@@ -4,14 +4,14 @@ import warnings
 def code_assist(project, source_code, offset, resource=None,
-                templates=None, maxfixes=1, later_locals=True):
-                templates=None, maxfixes=1, later_locals=True, case_sensitive=False):
   """Return python code completions as a list of `CodeAssistProposal`\s

Supporting case insensitive completions seems interesting...

 class _PythonCodeAssist(object):

 def __init__(self, project, source_code, offset, resource=None,
-                 maxfixes=1, later_locals=True):
-                 maxfixes=1, later_locals=True, case_sensitive=False):
       self.maxfixes = maxfixes
       self.later_locals = later_locals
-        self.case_sensitive = case_sensitive
-        self.startswith = _startswith if case_sensitive else _case_insensitive_startswith
       self.word_finder = worder.Worder(source_code, True)
       self.expression, self.starting, self.offset = \
           self.word_finder.get_splitted_primary_before(offset)
  @@ -315,7 +325,7 @@ class _PythonCodeAssist(object):
   def _matching_keywords(self, starting):
       result = []
       for kw in self.keywords:
-            if kw.startswith(starting):
-            if self.startswith(kw, starting):

I suggest calling startswith() and tolower().startswith() directly here depending on the value of self.case_sensitive and removing self.startswith.

Ali

Dec 22 '13 16:12 aligrudi