cursorless avoid allocating hats to the first letter of a token

trafficstars

We could get much fancier than this, but after running this with a day it appears to help some, and it is nice and simple.

I propose that we declare that it fixes #1658, at least for now.

Checklist

[/] I have added tests
[/] I have updated the docs and cheatsheet
[/] I have not broken the cheatsheet

Aug 02 '23 23:08 josharian

I plan to keep running this for a little while longer, gathering data, but I thought I would share it in case anyone else wants to play with it.

(I know the tests are busted.)

Aug 02 '23 23:08 josharian

here's another rev. lots of tests are still failing; it's going to be tedious to fix them, so I'd like to wait until we are relatively confident in the rest of the direction.

Aug 08 '23 02:08 josharian

notes to self:

correctly handle _abcTest (are we avoiding _ or a?)
perf test
maybe re-use tokenizers
switch to ranges
tests: stats, fixtures
data gathering for end users
- no phones/replace
- jsonl
- open append/exclusive
- command payload
- rotate monthly
- include extension version

Aug 12 '23 01:08 josharian

update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form

Jun 20 '24 10:06 pokey

update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form

great, thanks!

Jun 25 '24 00:06 josharian

@josharian Have you evaluated the difference between just avoiding the first character in the token verses the first character in every subword? When I first thought about this problem I kinda just envisioned the first character in the token, but your implementation is doing every subword which could be better. Any insight?

Jun 25 '24 04:06 AndreasArvidsson

I remember thinking at the time that doing sub words was important. But It is not something I ever gathered data about, because the effects are purely qualitative. And a lot of time has now gone by…

Jun 25 '24 04:06 josharian

I just did some performance tests. Using a single editor with typescript the hat allocation went from about 6ms to 8ms. Percentage wise quite a lot, but two milliseconds we can live with.

Feb 22 '25 14:02 AndreasArvidsson

cursorless cursorless copied to clipboard

avoid allocating hats to the first letter of a token

Checklist

cursorless
cursorless copied to clipboard