cursorless Accept line number range without repeated direction word

Fixes #438

For example, "take row 5 past 9".

Checklist

[x] Check DFA compilation times on public version of Talon. You can add timings.py to your Talon user directory to do so
[x] Try "take row five past just nine"
[x] Try "take row five slice past seven"
[x] Try "take five slice past seven"
[x] I have added tests (currently not possible 😕)

Apr 18 '22 03:04 wolfmanstout

The biggest concern here is that we don't support "take row five and seven", and it's not clear how to support that without duplicating the top-level compound target grammar. That may be fine, but might be something we want th think through a bit

Apr 19 '22 16:04 pokey

@AndreasArvidsson is planning to do verification described in edited issue description

@pokey I tried to take a quick look at this but run into technical difficulties which I don't have time to solve right now. The problem is not with the code but with my setup and how to test talon code on a mono-repo. I'm going to have to look into this but I don't have time right now with work and all.

Apr 21 '22 06:04 AndreasArvidsson

@AndreasArvidsson is planning to do verification described in edited issue description

@pokey

I tried to take a quick look at this but run into technical difficulties which I don't have time to solve right now. The problem is not with the code but with my setup and how to test talon code on a mono-repo. I'm going to have to look into this but I don't have time right now with work and all.

No worries. Shall we plan to discuss testing talon code in monorepo at the next meet-up you're able to make?

Apr 21 '22 07:04 pokey

@AndreasArvidsson is planning to do verification described in edited issue description

@pokey I tried to take a quick look at this but run into technical difficulties which I don't have time to solve right now. The problem is not with the code but with my setup and how to test talon code on a mono-repo. I'm going to have to look into this but I don't have time right now with work and all.

No worries. Shall we plan to discuss testing talon code in monorepo at the next meet-up you're able to make?

That would be good! :)

Apr 21 '22 11:04 AndreasArvidsson

Thanks folks for taking a look! Does it make sense for me to implement any of these changes, or should I wait on what sounds like some higher level discussion (e.g. how to support "take row five and seven")? Looks like I may be able to join in on the Tuesday Cursorless meeting if that's helpful.

@AndreasArvidsson I've been using symlinks to test the talon code here. I was pleased to find that Windows now supports symlinks, which means it is supported across all platforms.

Apr 21 '22 13:04 wolfmanstout

I'm also around for the Sunday meeting, so whichever is better. Unless you think this is best discussed offline.

Apr 21 '22 13:04 wolfmanstout

Tuesday is good for me. I might attend Sunday but it's a bit late. Yes I know I really have to look into symlinks on windows

Apr 21 '22 14:04 AndreasArvidsson

Here is how I symlinked my Talon directory on Windows:

Open an administrator command prompt (has to be admin, has to be cmd, not powershell).
cd %APPDATA%\talon\user
mklink /d cursorless-talon C:\Users\james\projects\cursorless\cursorless-talon

Apr 21 '22 15:04 wolfmanstout

@wolfmanstout Thanks!

Apr 26 '22 07:04 AndreasArvidsson

Discussed in meet up. Plan is to try adding support for "and" and seeing how it impacts DFA compilation.

Apr 26 '22 17:04 wolfmanstout

UPDATE: I misread the output. The slower ones are VS Code, since compilation would be logged when the command is spoken in that new context. I guess there is some impact here on DFA compilation. I will let you decide whether this is problematic.

Added support for "and", so I think this is ready. I'm not seeing a significant impact on DFA compilation on public Talon. I tested by saying "alt-tab" between my browser and VS Code several times. Before (the faster one is VS Code):

2022-05-03 12:26:27    IO [audio]=870.000ms [compile]=1246.507ms [emit]=82.556ms [decode]=0.414ms [total]=1329.477ms 
2022-05-03 12:26:29 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:26:29    IO [audio]=840.000ms [compile]=422.337ms [emit]=82.929ms [decode]=0.335ms [total]=505.601ms 
2022-05-03 12:26:34 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:26:34    IO [audio]=840.000ms [compile]=1236.606ms [emit]=79.085ms [decode]=0.348ms [total]=1316.039ms 
2022-05-03 12:26:36 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:26:36    IO [audio]=900.000ms [compile]=406.907ms [emit]=85.763ms [decode]=0.400ms [total]=493.071ms 
2022-05-03 12:26:41 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:26:41    IO [audio]=900.000ms [compile]=1125.349ms [emit]=87.742ms [decode]=0.522ms [total]=1213.613ms 
2022-05-03 12:26:42 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:26:42    IO [audio]=1290.000ms [compile]=463.282ms [emit]=108.860ms [decode]=0.541ms [total]=572.683ms

After:

2022-05-03 12:27:13    IO [audio]=870.000ms [compile]=1340.402ms [emit]=89.701ms [decode]=0.453ms [total]=1430.556ms 
2022-05-03 12:27:15 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:27:15    IO [audio]=840.000ms [compile]=435.231ms [emit]=84.587ms [decode]=0.349ms [total]=520.167ms 
2022-05-03 12:27:19 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:27:19    IO [audio]=1020.000ms [compile]=1359.158ms [emit]=89.376ms [decode]=0.479ms [total]=1449.013ms 
2022-05-03 12:27:21 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:27:21    IO [audio]=840.000ms [compile]=449.949ms [emit]=83.241ms [decode]=0.270ms [total]=533.460ms 
2022-05-03 12:27:27 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:27:27    IO [audio]=900.000ms [compile]=1408.686ms [emit]=84.827ms [decode]=0.420ms [total]=1493.933ms 
2022-05-03 12:27:29 WARNING Skipped unknown tokens: ['!', '(', ')', ',', '-', '3', '?', '@', '[', ']', '`']
2022-05-03 12:27:29    IO [audio]=840.000ms [compile]=408.983ms [emit]=84.160ms [decode]=0.306ms [total]=493.449ms

May 03 '22 20:05 wolfmanstout

Also, I had to update my fork because there were merge conflicts due to upstream changes. I did a rebase, but for future reference, do you prefer merge, rebase, or a particular git-imerge mode for this?

May 03 '22 21:05 wolfmanstout

Correction: I had noted earlier that there was no significant DFA compilation time impact, but I've corrected that comment (there is some measurable impact).

May 03 '22 21:05 wolfmanstout

Sorry for the slow response here. 1200ms DFA is pretty brutal. Let me check on timeline for DFA hitting public. If it's going to be a while then we may think about having separate branches / tags or something

May 30 '22 12:05 pokey

Re merge vs rebase, we don't care too much

May 30 '22 12:05 pokey

@wolfmanstout looks like this one is back on the table now that DFA improvements have shipped to public. Is it ready for review or were there other tweaks you wanted to do? Looks like there are a couple of merge conflicts fwiw

Aug 03 '22 15:08 pokey

Unfortunately, there isn't an obvious easy/clean fix to the merge conflicts, to my eyes. There appear to have been some very sensible refactorings which tightened the scope of the cursorless_line_number capture, such that it is expected to only return the contents of "mark", as set in cursorless_primitive_mark. In the past, cursorless_primitive_mark called dict.update() to merge in any fields from the return value of cursorless_line_number. I was relying on that so that cursorless_line_number could return a range. I don't think there is a simple way to resolve this without essentially undoing what seems to be a clean refactoring. I can't just add support for ranges in cursorless_primitive_mark, because at that point I am no longer guaranteed that the two components of the range have the same type.

This isn't altogether a huge surprise; I was surprised that my hacky fix worked in the first place. Do you see an obvious fix? Seems like maybe this needs to be supported on the extension side?

Aug 04 '22 05:08 wolfmanstout

@wolfmanstout yeah I'd be inclined to add fields excludeAnchor and excludeActive to lineNumber target, pass through proper anchor and active rather than having them always equal, and handle things extension side. Shouldn't be too bad; lmk if you want a hand

Aug 04 '22 10:08 pokey

Thanks for the summary. Will be busy this weekend but can take a look later and see if I have questions. Of course if you want to pick this up don't wait on me.

Aug 05 '22 02:08 wolfmanstout

@pokey I took a closer look at your suggestion and the code. It looks like sub_token.py is an example of embedding ranges like this within a target. But following on the original feedback here, wouldn't we also need to support "and" so I can say "take row 30 past 31 and 40 past 42"? Would you want to allow all this to be represented within the line number target, i.e. nest all this in an "elements" field adjacent to "type": "lineNumber"? Apparently subtokens don't support this.

I'm not familiar with the extension-side code yet, but I would have thought that it would be cleaner to leave the grammar as-is (the above example already parses) and have the extension fix the implicit marks.

Aug 16 '22 17:08 wolfmanstout

I view "and" as a nice-to-have, so I'm not sure we should block the PR on adding this, but cc @AndreasArvidsson who actually uses line numbers

Aug 16 '22 18:08 pokey

I think that and is probably more useful than through but it's not a showstopper for me. We can always follow up with that change later.

Aug 16 '22 20:08 AndreasArvidsson

Fair enough. @wolfmanstout if you do decide to take a crack at "and" in this PR, I think your proposal sounds reasonable. I'd make it so that lineNumber marks always use elements, and just use a degenerate single-element list when the user doesn't say "and", similarly to how we use a degenerate range when they don't say "past"

Aug 16 '22 20:08 pokey

Makes sense, I'll give it a try and let you know if I hit any blockers. Thanks folks.

Aug 16 '22 21:08 wolfmanstout

Okay, I have draft ready for you to look at. I haven't added any tests yet; I'm guessing I should do some recorded tests. Here are my main questions:

You'll see that I have a couple alternative ways of implementing the extension side of things, one of which is commented out. The first (uncommented) way is perhaps a little hacky in that it synthesizes a standard TargetDescriptor and reuses the processTargets function to handle all the logic for range types and exclusion. The second way uses more finely-scoped functions, which avoids synthesizing a TargetDescriptor. It also duplicates a lot of the logic for handling ranges, though, and it doesn't even support vertical ranges yet. Personally, I strongly prefer the first way as it reuses more logic, and is actually pretty readable because it explains exactly how the line number mark is mapped to the standard way of declaring ranges and lists.
In order to maintain backwards compatibility, I changed the existing anchor/active to add a ? so they may be unset. I kept the extension-side logic for handling this as-is (fixing a minor bug along the way), and I found that this served as a convenient "base case" to support my proposed implementation above, avoiding infinite recursion when calling processTargets. Alternatively, I could check if elements.length == 1 and elements[0].anchor is equal to elements[0].active. My current approached seemed a little more convenient, but let me know if you have a strong preference.
I decided to go all-in on using explicit degenerate values instead of optionals that are resolved later. In other words, none of the fields in my new LineNumberRange interface are optional. This does mean that more default values have to be set Talon-side, such as "continuous" if no range type is spoken. Let me know if you would prefer a different approach.

Aug 18 '22 22:08 wolfmanstout

@pokey Checking in on this before my free time completely evaporates in a few days (parental leave ending, in-laws leaving). Any thoughts on the above questions?

Aug 26 '22 21:08 wolfmanstout

I like your uncommented approach.

I don't like the idea of keeping both old and new version of lineNumber, though. I'd argue for a migration, and only supporting the new form.

I do have somewhat mixed feelings about these line number structures being supported within the api between cursorless talon and vscode. I'm tempted to do the mapping Talon-side, because it does add complexity to our api surface. @AndreasArvidsson wdyt?

I think this one might warrant another discussion in a meet-up

Aug 29 '22 23:08 pokey

I think I agree on both counts. Maybe we should go through this pr in more detail in next meet up?

Aug 31 '22 11:08 AndreasArvidsson

I'm gonna be very busy for at least the next few weeks so I may not be able to make it to a meet up soon. I'd say don't wait for me: feel free to discuss, share notes, and even implement the changes if you'd like. I'm sure I'll eventually have time to implement them but I don't know when that will be.

Aug 31 '22 14:08 wolfmanstout

@wolfmanstout Sorry this has taken so long but we have now made some provisions in pr #960 that should be useful here once it merged.

https://github.com/pokey/cursorless-vscode/blob/b52f9ca28ef3989e3f02e22c91cd80ca6d8b20dd/src/typings/targetDescriptor.types.ts#L32-L46

Sep 28 '22 07:09 AndreasArvidsson

cursorless cursorless copied to clipboard

Accept line number range without repeated direction word

Checklist

cursorless
cursorless copied to clipboard