plover icon indicating copy to clipboard operation
plover copied to clipboard

Speed up Plover start responsiveness by delaying auxiliary data structure construction

Open user202729 opened this issue 3 years ago • 6 comments

Currently, the dictionaries take about 1-2 seconds to load.

Describe the solution you'd like

Most of the time is spent in the construction of the lookup/case reverse lookup mappings. That can be done lazily (on request), or in another thread.

It's already possible to lookup in the dictionary without these structures.

normalize_steno is harder to eliminate... I have some possible ideas.

  • do it in an external thread, then save back to the dictionary on changes, accept incorrect lookup in the first few seconds
    • save the correction back to the file (optional)
  • compute a hash value of the dictionary to quickly check if the whole dictionary is normalized
    • this method, unfortunately, relies on that hash is faster than normalize simply because it's implemented in a faster language.

Describe alternatives you've considered

  • Implement the whole thing in a faster programming language. (actually not a bad idea, but might make the program less portable). Might actually be easier.

(FWIW I make a local change to remove the normalize_steno on load. It is somewhat faster.)

user202729 avatar Apr 05 '21 15:04 user202729

I'd like for normalize_steno to be optional, maybe only if the dictionary changed.

Other possible way to improve performance I've experimented with: cythonize the Plover package.

benoit-pierre avatar Apr 05 '21 15:04 benoit-pierre

Prototype (actually works):

https://github.com/user202729/plover-json-lazy

Overrides the built-in JSON dictionary.

There are quite a lot of copy and paste from Plover's code.


Currently, I think an arbitrary one is chosen by Plover (the RTF dictionary plugin can't override Plover's built-in RTF plugin: https://github.com/sammdot/plover-better-rtf )

If different distributions provide the same name, the consumer decides how to handle such conflicts. [source: https://packaging.python.org/specifications/entry-points/]

At the moment, it happens to work on my machine.

Perhaps prioritizing user's dictionary in case of conflict is better.


Performance: 1.4 -> 0.6s, saves 0.8s, although I also include a hack to remove normalize_steno; otherwise it would have taken 2.2s.

There are some more performance discussions in issue 1243. (in my case, I'm quite annoyed that I have to restart Plover frequently during development -- by the way my Plover restart stroke is defined as {PLOVER:SHELL:xterm -e bash -c "sleep 0.1s; plover; bash" &}{PLOVER:QUIT})


Is this a desirable feature?

user202729 avatar Apr 23 '21 14:04 user202729

One way to speed up normalization is caching. Dropping @functools.lru_cache(maxsize=None) in front of normalize_stroke will speed things up a bit since some strokes are much more common than others. The stats I got for my dictionary loadout were:

CacheInfo(hits=310276, misses=35377, maxsize=None, currsize=35377)

Somehow, the Plover process on my machine ends up using less total memory after loading completes with the cache than it does without it, even with an explicit garbage collection pass at the end. I have no idea what Python is doing with its heap to cause this, but I don't expect the actual memory cost of the cache to be very much anyway since most of the string objects in it are shared with the dictionaries.

fourshade avatar May 23 '21 03:05 fourshade

About the memory usage part: try sys.intern?

Besides Python doesn't always give back free memory to the operating system, it uses its own allocator.

user202729 avatar May 23 '21 03:05 user202729

I remember trying sys.intern in a few places long ago and had no luck getting improvements of either type (performance or memory). If you can find a clever spot to put it that helps, please share!

No time tonight to dig into heap profiling tools, but it does appear that clearing the cache manually at the end releases 1.3 MB back to the operating system. That seems reasonable enough.

fourshade avatar May 23 '21 04:05 fourshade

I have an idea of what could be happening; there could be a net savings of memory after garbage collection if the cache is reusing string objects. In a dictionary created using the cache, the keys will only contain references to one string object for each unique stroke read from the file. Without it, str.split could fill the dictionary with distinct copies of identical strings. Observe:

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
>>> strings = ["this is a test", "this is a test", "a test this is"]
>>> words = [word for s in strings for word in s.split()]
>>> print({id(word): word for word in words})
{54240432: 'this', 54241072: 'is', 31515440: 'a', 54240560: 'test', 54241136: 'this', 54240496: 'is', 54241008: 'test', 54241264: 'test', 54241328: 'this', 54241392: 'is'}

CPython does have an internal string cache, but it only extends to single characters by default (note the string 'a' has only one instance). My OS reported a total savings of around 18 MB; I'll see if I can get more detail on the memory structure of the completed steno dictionaries.

UPDATE: Memory savings is confirmed. Calling sys.getsizeof on the recursive/memoized contents of the raw dictionaries is giving me a total of 41,098,908 bytes before caching and 24,773,536 bytes after. This wasn't really the purpose of the cache, but it's a nice side effect.

fourshade avatar May 23 '21 22:05 fourshade