cursorless
cursorless copied to clipboard
avoid allocating hats to the first letter of a token
We could get much fancier than this, but after running this with a day it appears to help some, and it is nice and simple.
I propose that we declare that it fixes #1658, at least for now.
Checklist
- [/] I have added tests
- [/] I have updated the docs and cheatsheet
- [/] I have not broken the cheatsheet
I plan to keep running this for a little while longer, gathering data, but I thought I would share it in case anyone else wants to play with it.
(I know the tests are busted.)
here's another rev. lots of tests are still failing; it's going to be tedious to fix them, so I'd like to wait until we are relatively confident in the rest of the direction.
notes to self:
- correctly handle _abcTest (are we avoiding _ or a?)
- perf test
- maybe re-use tokenizers
- switch to ranges
- tests: stats, fixtures
- data gathering for end users
- no phones/replace
- jsonl
- open append/exclusive
- command payload
- rotate monthly
- include extension version
update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form
update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form
great, thanks!
@josharian Have you evaluated the difference between just avoiding the first character in the token verses the first character in every subword? When I first thought about this problem I kinda just envisioned the first character in the token, but your implementation is doing every subword which could be better. Any insight?
I remember thinking at the time that doing sub words was important. But It is not something I ever gathered data about, because the effects are purely qualitative. And a lot of time has now gone by…
I just did some performance tests. Using a single editor with typescript the hat allocation went from about 6ms to 8ms. Percentage wise quite a lot, but two milliseconds we can live with.