keyman
keyman copied to clipboard
fix(web): segmentation event + Promise resolution ordering 🐵
A part of the epic #7324 line, and the first member PR of "phase 3" for gesture implementation... at least "in spirit."
It's important to make sure that the segmentation engine's events and Promises provide a usable interface with consistent ordering to the upcoming gesture-synthesis layer. While doing very early prototyping for said layer, I came to realize that there were a few issues that made their use - especially in unit tests - far more difficult than would be ideal.
To clarify the issues and ensure they're properly addressed, this changeset also adds a unit test heavily based on said "very early prototyping", the start of which may be seen as follows:
https://github.com/keymanapp/keyman/blob/62afd6df214566e1c5799ea897a1aeb0da7b2328/common/web/gesture-recognizer/src/test/auto/headless/trackedPath.js#L372-L374
Since almost all of the Segments are produced and resolve asynchronously, using a state-machine based model to detect gestures needs to process each segment as it arrives, one at a time. However, the start
Segment and the first pending Segment
are produced at the same time, synchronously; this meant that specialized handling would be needed to get the event for both Segment
s.
Rather than requiring significantly complex handling external to TrackedPath
and the segmentation engine, I believe the simplest solution is simply to provide both within a single 'raise' of the 'segmentation'
event, hence the change to the event's signature. This may slightly complicate the synchronous signatures at play, but it prevents much greater complexity for interfacing with the asynchronous aspects of the design.
It is then possible to handle each segment in sequence via async
/ await
as follows:
let segmentEventPromise = () => new Promise((resolve) => touchpath.once('segmentation', resolve));
// ...
// Use the segment at index 1 for the initial `segmentation` event; index 0 otherwise.
let segment: Segment; // corresponding to the current "link" in the Promise "chain"
// State machine decisions are first possible once the segment type is 'recognized'.
segmentType = await segment.whenRecognized;
// Must be set _before_ the await below; the resolution signal and the next segment happen
// synchronously (in the same 'message' of the main message-loop)
//
// Skip if `segmentType` == 'end'; the chain ends in that case.
let nextPromise = segmentEventPromise();
if(transitionOnlyDependsOnSegmentType) {
// Do state-machine transitions based on the new segment's type if `segmentType` is all that matters.
} else { // if the state machine transition depends on a property only determined at resolution time:
// The current Segment will always fully resolve before its successor is emitted.
await segment.whenResolved;
// Do state-machine transition
}
if(segmentType != 'end') {
[segment] = await nextPromise; // Shifts to the next link in the chain.
// Conceptually - goto: line 6 - `let segment: Segment;` above
}
@keymanapp-test-bot skip
User Test Results
Test specification and instructions
User tests are not required
Test Artifacts
- Android
- Developer
- iOS
- Keyboards
- Web
- Windows
... right, I didn't actually double-check for browser-based unit-test breakage; I only focused on the headless-mode stuff. The failed test run for Web is nigh-certainly a "mia culpa."
I do wonder if we need to be using promises like this. Can't we just check the gesture state machine on a (a) touch event, or (b) timeout?
Even should the rest be simplified... (b)'s one aspect of what the Promise
s are for. It's not difficult to wrap a timeout call with a Promise
. At the most basic level:
https://github.com/keymanapp/keyman/blob/9e17e1802f6512b77e9712877876b8f4f59158da/common/web/gesture-recognizer/src/tools/unit-test-resources/src/timedPromise.ts#L8-L19
And really, there's no need for that func
param, now that I look at it. That can be safely placed on the .then
clause. A bit more work the above code would lead to support for timeout cancellation as well.
Given our current discussions, I hope you don't mind if I return this to draft to take it out of the review queue temporarily?
Took some work, but I've successfully revived the branch; it's now based on the modularized form of the gesture engine.
This does nothing to address previous review concerns; I just figured that getting this up and running before making further changes would probably be wise; it's best to start new work from a stable position, after all.
This (and the 🐵 / gesture work) has been dormant for quite a while, so it'll probably take a bit to reboot discussion and our understandings fully.
If memory serves, my perspective on the prior discussion was that I needed to look more at the "forest" of the prior discussion than the "trees" of it; the general concern as opposed to the details.
-
The main concern, I believe, was the code/development complexity of the subsegmentation algorithms and such.
- In particular... how it was costing a fair bit of time to develop.
- We did have to delay the release of gestures past 16.0, after all, so it's clearly a concern with merit.
-
I know that there were also concerns regarding the math.
- I feel like this is more of a "tree" - a detail - than the real focus, though.
- If we drop to assuming anything between 'touch' and 'release' is a single segment, then we get just linear regression, which is only undergrad-level stats / linear algebra.
- The existing
CumulativePathStats
class is designed to do this live as new input samples arrive.
- The existing
- The stats do get more 'involved' when we drop the last point's assumption and utilize subsegmentation as I had been doing.
- These 'fancier' bits are found within
PathSegmentation
and its related classes. - These 'more involved' stats are just a more advanced version of regression - "segmented regression" - along with related bog-standard statistical tests that anybody with a Master's in stats should find familiar.
- The "least standard" parts would be some of the details unique to segmented regression, but at least those are pre-existing and pretty well documented.
- These 'fancier' bits are found within
Looking back, and looking at the code that exists, I'm fairly confident that I can shift gears and go with a "single segment" approach for now. Makes me a bit sad, as I'd actually gotten the more advanced style's foundations working, but I suppose the "more advanced style" would take more time to vet properly.
That said, there are still a few concerns and points of difference worth talking about.
- We generally aim for strong backward compatibility and cross-version consistency if at all possible.
- I'd like to assert a corollary here: Anything new we add should not be too restrictive on what we do in the future.
- We've generally been quite adverse to "breaking changes"... and even to changes that require compiled checks in keyboards like "if in older version, do X instead of Y".
This was one of the big contention points, I think. Part of the original gesture design-work aimed to ensure that we can reasonably extend to multi-segment flicks and/or swipe-typing in the future. It may be a "while off", of course.
The point I'd like to make here: any expectation we set here for how certain gestures operate will continue to be an expectation if and when we do add the more complex ones.
- This showed up most prominently when talking about flicks.
- My perspective: if we're overly lenient with single-segment flicks, anything that causes us to retract that leniency will cause a 'break' in backward compatibility from the perspective of users.
- I'm fine with prohibiting multi-segment flicks.
- However, we've noted community demand at points for swipe-typing:
- #8233
- https://github.com/keymanapp/keyman/issues/5029#issuecomment-1093838133
- Many in the western world (with access to good predictive text, admittedly) have a preference in this direction, so I wouldn't be surprised to see more requests in the future.
- There's also the natural desire for good UX when it comes to gestures.
- Accessibility - how usable is it for people with less mobility / dexterity?
- Consistency - how reliable is it / how easy are the patterns to learn?
- ... potential for irritability - how much / little does it "get in the user's way" or require them to "go out of their way"?
And these points come to a head when we're talking about flicks.
My position:
- We should be restrictive with what may be interpreted as a flick, keeping in mind potential future gesture types.
- Allowing something to act as a flick now when it could not be interpreted that way later if a desired extra gesture type were added... that feels dangerous.
- It's hard to get this 100% right at this point, of course.
- Flicks are normally marked as an accessibility concern; if it can be typed via flick, it should also be typeable via a more "accessibility-friendly" pattern elsewhere within the keyboard.
- Longpresses are also a concern, but they're likely "more accessible" than flicks and they have no "max time" threshold.
- Accidentally-started flicks should be rare. (Keep 'false positives' low!)
- People get bumped slightly, type while they're walking, etc.
- The existing "roaming touch" behavior is itself an accessibility feature - it enlarges key caps, after all.
Your position, I think:
- For accessibility reasons, we should be more forgiving with what can be interpreted as a flick.
- The boundary between "roaming touch" vs "flick" is too blurry to allow both for the same keyboard.
- ... flicks are more important than roaming touches?
- This is just an inference, admittedly.
Now, if we decide we want to be fancier on this, we still only need a single function call on the array of captured points between touch start and touch release, which returns the direction of the flick.
I believe we need to be "fancy" in this regard. We should validate that flicks are in a relatively straight line. Suppose the touchpath bends significantly - say, from 15 pixels straight North to moving another 15 pixels West.
- if doing multi-segment flicks, that's straight-up a N->W flick.
- if allowing swipe, those are probably two separate components of a swipe. (It could be 'f' -> 'r' -> 'e' on a US keyboard, for example.)
Even worse - 15 pixels straight North, then 20 pixels straight South... for a net of 5 pixels South. Is this reasonable as a "flick to the South"?
- Fortunately, this is as far from single-segment straight as possible, so it'd be very easy to rule out as a legal flick.
Either way, the two sequences above are far better suited to either potential future gesture type; it'd be awkward to have to backtrack a decision to accept that the first as a single-segment NW flick in the future due to backward-compatibility/consistency.
Fortunately... we can handle this reasonably with single-segment regression tests for now; "is this a straight line" is a pretty easy question for our segment stats-tracking mechanism to answer.
One possible point of concern:
Any macro movements from the starting point would cancel the timeout, which then cancels the longpress, and a new timeout can be started for a potentially different longpress to cater for roaming touch.
... and when does a 'macro movement' end? That's actually what subsegmentation is designed to detect and answer.
Our existing answer in KMW 'til now: when a new key ends up underneath the touchpoint, if we aren't yet locked in on a longpress, consider it a "macro movement".
- Do note the "for roaming touch" bit - this clearly assumes that no flicks are expected; the same motion would be very reasonable for a flick.
- It's not perfect, to be sure... but it is a decent approximation and can work for now.
All of that last comment to say... I think it actually is possible to make the basic, earlier-targeted gestures work decently without the subsegmentation engine, and existing pieces are sufficient to fill in the gaps for now. In which case, I guess that means we continue to leave this PR iced 🧊 until we decide to revisit and extend to allow more complex gestures.
I am still concerned on our differences regarding one notable point, though:
Speed of flick should not be considered -- I think we are going to run into usability issues if we make flicks dependent on speed.
It's a flick - there are going to be usability issues in one direction or another. The speed aspect is also part of the word's standard definition. If we don't consider speed at all, I believe it impossible to fully reconcile with roaming touches - even with a more complicated model.
Suppose a straight movement to the east for two keys' width on a touch device, then a 0.5 second pause.
- Option 1: that's a flick to the East
- possibly cancelled due to the pause after initial confirmation as a flick?
- Option 2: that's a roaming touch into a longpress two keys over.
Would your answer (re: option 1 or option 2?) differ if I said...
- the "straight movement" took 0.3 seconds?
- the "straight movement" took a full second?
- If the pause after the motion were 3 seconds instead of 0.5 seconds, would your answer change?
Once a roam has started, flicks and multitaps will not meet their starting conditions, so will naturally be ignored.
The problem being... if both are permitted, we probably can't confirm a roam until we've ruled out flicks. Speed would be a useful tool for detecting failed flicks in order to enter roaming mode.
Roaming -- in initial implementation, I suggest these are only applicable if the keyboard doesn't have flicks. We may be able to tweak this in the future with distance thresholds -- not speed thresholds.
If someone uses fast flicks - like, in the span of 0.2 seconds, a quick motion - I believe it would be harder for that person to consistently constrain the distance to meet such a "max distance" threshold. You'd end up penalizing the people who would otherwise get the most efficiency out of the feature, accidentally creating a "max speed" threshold. I'd see that going one of two ways:
- Auto-cancels the input entirely -> "How did it miss my keystroke? Is this keyboard dropping inputs?"
- "I guess it's a roaming touch now?" -> Selects the key under the would-be flick's endpoint: "I didn't ask for that key."
Now, if we auto-cancel on reaching an OSK boundary or device-screen boundary... that's more self-evident and provides an easier-to-recognize pattern for cancellation. They're boundaries that don't "move" based on the original touchpoint.
I do think we can move forward with dropping subsegmentation without arriving at a full answer to this concern, at least. The existing stats measurements do facilitate tracking speed, after all; it's just a matter of if we decide not to 'tap' that info.
I believe it impossible to fully reconcile with roaming touches - even with a more complicated model.
Indeed, which is why I have been saying drop roaming touches ... for the last however long.
If we don't consider speed at all, I believe it...
This is a critical part of the quoted statement.
Your position, I think:
- [omitted]
- ... flicks are more important than roaming touches?
- This is just an inference, admittedly.
I'm concerned about the relationship of accessibility to roaming touch & its key previews; this seems to be dropping a "more accessible" feature for a "less accessible" one, with no plans to reconcile them at any point. I do not believe reconciliation impossible... unless we drop flick speed as a factor in determining "flick or not?". This is part of why I've been adamantly lobbying for it as a flick criterion... for the last however long.
To be clear... it does seem like we'll need to drop it for 17.0 anyway for the first wave of flick implementation - though we could offer an option of "disable all flicks" out of concern for accessibility for affected people, which would allow restoring it to those in need of key previews.
Succinct through-line:
- key previews help with accessibility, especially on smaller phone form-factor devices
- because key previews are good for accessibility, we should keep them when and where possible.
- Therefore, if there's a reasonable way to reconcile key preview gesture requirements with flick gesture requirements, we should take that.
- I do not believe the former point is possible unless we consider motion speed as a criterion for flick gestures.
- Given all points above within this list, we should thus consider motion speed as a criterion for flick gestures - to reconcile "shiny new feature" with "existing accessibility functionality".
Took a bit of digging, but I found a few reference points that've been sticking in my mind:
https://github.com/srl295/cldr/blob/ecf728ab5a4ccd5678ae08c2a949b2bebbfe37df/docs/ldml/tr35-keyboards.md#accessibility
Some features, such as multiTap and flicks, have the potential to reduce accessibility and thus should be discouraged. For example, multiTap requires pressing keys at a certain speed, and flicks require a more complex movement (press-and-flick) beyond a simple tap. Alternatively, inclusion of accessible methods of generating the same outputs (for example, simple keys on an additional layer), should be considered.
That was added in https://github.com/unicode-org/cldr/pull/2360.
Even longpress is discouraged by the LDML keyboards doc, in fact:
https://github.com/srl295/cldr/blob/ecf728ab5a4ccd5678ae08c2a949b2bebbfe37df/docs/ldml/tr35-keyboards.md#element-key
Attribute:
longPress="a b c"
(optional) (discouraged, see [Accessibility])
We've triaged the segmentation aspects of this until way later if at all, opting for a different strategy in the meantime. Also, recent PRs have pulled in many of the other related aspects of this PR, such as the TouchpathTurtle
improvements. We're good to go ahead and retire this PR.