winit icon indicating copy to clipboard operation
winit copied to clipboard

New IME API is too restrictive

Open dhardy opened this issue 1 month ago • 23 comments

In short, I want to use the "cursor area" capability, but provide the area through an update later. The old v0.30.x IME API allowed that and it worked.

With the new API:

    let capabilities = ImeCapabilities::new().with_hint_and_purpose().with_cursor_area();
    let data = ImeRequestData::default().with_hint_and_purpose(hint, purpose);
    let req = ImeEnableRequest::new(capabilities, data.clone()).unwrap();

fails to unwrap() because I haven't provided the cursor area yet. Alternatively if I don't unable .with_cursor_area() then I get errors later:

2025-11-17T14:30:07Z WARN  winit_wayland::seat::text_input] discarding IME cursor area update without capability enabled.

Given that hint, purpose and surrounding text are widget-level details (content) while position is a window-level detail (a widget has no idea what translations may have been used), I find it surprising that both constructs are tied into the same "update". Essentially it forces me to keep a copy of the widget-level details at the window-level.

Also, why can ImeEnableRequest::new fail in the first place? Why even require that capabilities be specified explicitly if the specification requires that they match provided data exactly — can we not just implement From<ImeRequestData> for ImeEnableRequest?

dhardy avatar Nov 17 '25 14:11 dhardy

Note: currently my code recalculates the "cursor area" every frame simply because there is no easy way to know whether any widget translation (scroll position) has changed, so though it would be possible to calculate the position initially, there really is no point.

I could work on a PR here, but I'm not quite sure what direction is desired.

dhardy avatar Nov 17 '25 14:11 dhardy

The goal is to explicitly provide all the data needed for the enable request. For the updates, you don't have to do that.

The reasoning is that some platforms don't really work if you don't provide all upfront, so winit was filling it for you before with some defaults, now you have to kind of tell winit about all of that, which means that you actually can have the IME popup be disabled.

kchibisov avatar Nov 17 '25 15:11 kchibisov

To be more clear. On wayland if you don't send it by the time you enable IME, you won't have position used at all, so this API kind of solves it, since before it was not guaranteed to work with some compositors, now semantics are kind of encoded.

kchibisov avatar Nov 17 '25 15:11 kchibisov

So we have to go with the most limited API and provide bogus data to everything else?

How about allowing a capability/data mis-match, but filling in bogus data on only the platforms which require it? Thus on Wayland, winit may invent position & size data if not provided; otherwise only the provided data is used.

dhardy avatar Nov 17 '25 15:11 dhardy

The point is to identify that we have to fill in the first place. Like surely, we can derive bogus data based on capabilities, that's fine with me, but we need caps in the first place to deal with "want/not want" IME functionality.

Tbf, this was all done to prevent user errors early on, since once they figured out, they unlikely to make mistakes, rather than write IME just badly.

kchibisov avatar Nov 17 '25 15:11 kchibisov

How about allowing a capability/data mis-match, but filling in bogus data on only the platforms which require it? Thus on Wayland, winit may invent position & size data if not provided; otherwise only the provided data is used.

And again, while it's not that ergonomic, it's mostly one time issue. We certainly can fill bogus data and log warnings if you do so instead of hard failing in those builders.

kchibisov avatar Nov 17 '25 15:11 kchibisov

So the purpose is to prevent mis-use of a feature which likely isn't going to be well tested... perhaps a good idea, though it's still easy enough to write a bad implementation.

I was more thinking so that the API doesn't force us to provide bogus data on other platforms, but I suppose it's not really an issue.

dhardy avatar Nov 17 '25 15:11 dhardy

We certainly can fill bogus data and log warnings if you do so instead of hard failing in those builders.

This has ~~two~~ three advantages:

  • A nicer API
  • Not forcing users to handle errors in ImeEnableRequest::new (which should IIUC never fail unless misused)
  • Platforms which use split requests do not need to pass bogus data

dhardy avatar Nov 17 '25 15:11 dhardy

Ultimately the biggest issue remains that many devs have no idea how to use/test IME and accessibility tools. I've installed IBus, but there's no configuration to use IME with English (or any language I know), because it's not needed. So I'm pressing keys, seeing Japanese glyphs, and have no idea what I'm doing.

In the previous Kas version, IME is enabled by providing an ImePurpose in request_key_focus, then calling set_ime_cursor_area after Event::ImeFocus (sent on WindowEvent::Ime(Ime::Enabled)) is received. This appeared to work in my testing (Wayland / IBus).

dhardy avatar Nov 17 '25 15:11 dhardy

So, I read the Wayland zwp_text_input_v3 protocol.

Are hint and purpose expected to change or allowed to remain constant? The inclusion in ImeRequest::Update hints that they are allowed to change. The protocol states that changes to these values may have no effect. So perhaps better assume not.

The protocol notes that surrounding text is double buffered, but that if it is not supplied initially then it may be ignored later. Implication: it is not necessary to provide a fresh copy of the surrounding text every time something else (like the position) changes.

The protocol also clarifies the following regarding the surrounding text (not clear from the winit API/docs):

The client should notify the compositor of any changes in any of the values carried with this request, including changes caused by handling incoming text-input events as well as changes caused by other mechanisms like keyboard typing.

The protocol states that set_text_change_cause must be called after any change to text, or the cursor or selection positions. It doesn't look like winit ever calls this, leading (if I understand correctly) to the compositor assuming that all changes are caused by the IME.


Suggested new API:

pub mod ime {
    // Use existing ImeHint, ImePurpose, ImeSurroundingText types but drop prefix

    pub struct InitialData {
        hint: Hint,
        purpose: Purpose,
        surrounding_text: Option<SurroundingText>,
    }
    
    impl InitialData {
        pub fn new(hint: Hint, purpose: Purpose) -> Self { ... }
        
        pub fn normal() -> Self {
            Self::new(Hint::default(), Purpose::Normal)
        }
        
        pub fn set_surrounding_text(&mut self, text: SurroundingText);
    }
    
    pub enum ChangeCause {
        InputMethod,
        Other,
    }
}

impl Window {
    pub fn request_ime_enable(&self, data: ime::InitialData);

    pub fn set_ime_surrounding_text(&self, text: ime::SurroundingText, cause: ime::ChangeCause);

    pub fn set_ime_cursor_area(&self, pos: Position, size: Size);
}

This does not permit changing hint and purpose later which I think is a reasonable restriction.

It does suggest surrounding text be specified initially and should maybe just emit a warning if set_surrounding_text is called later without initial specification (to ensure uniform behaviour on all backends).

It requires specifying the change cause when updating surrounding text.

It allows setting the cursor area at any time without repeating other details (very useful if scrolling or other transforms cause the area to move).

(I could work on a PR but won't without unless the proposal is accepted.)

dhardy avatar Nov 18 '25 12:11 dhardy

cc @dcz-self

kchibisov avatar Nov 18 '25 12:11 kchibisov

Are hint and purpose expected to change or allowed to remain constant?

The protocol states that changes to these values may have no effect.

Unless you send them out in the initial update, in which case there's nothing stopping them from changing.

It doesn't look like winit ever calls [set_text_change_cause]

Yeah, that Wayland API is pretty useless already and will likely get deprecated in the next release.


This does not permit changing hint and purpose later which I think is a reasonable restriction.

I'm far from an Android expert, but Android seems to make hint constant by default by embedding in xml files. I can't find out what Apple does. Ultimately it depends on where on the spectrum between "lowest common denominator" and "options for every backend" winit wants to be.

It does suggest surrounding text be specified initially

This instantly breaks deleting text by bytes. We do have a fallback for this case, but I think it's nonsense: text inputs will either know what they contain (99% cases) or not (terminal emulators).

Regarding cursor area, what is the main reason for not providing it? Sure, tracking the surface-relative position is work. But making UIs is work, and if you don't provide the value, how is the input method supposed to position the candidates window? Not providing cursor position immediately on focusing the text field degrades user experience.

dcz-self avatar Nov 18 '25 13:11 dcz-self

Regarding hint and purpose, they do not look like the kind of thing which should change, but if this assumption proves to be wrong then the API can be extended later to accommodate this. But going with this assumption for now is the simplest approach and I would guess sufficient.

Yeah, that Wayland API is pretty useless already and will likely get deprecated in the next release.

Ah. Thanks for the insight.

It does suggest surrounding text be specified initially

I didn't mean only initially: I mean surrounding text should be specified initially as a pre-requisite to providing updates. (This can't really be codified into an API, but winit can warn if updates are provided without an initial value).

Regarding cursor area, what is the main reason for not providing it?

The question is more: why should the cursor area be specified immediately when specifying it through updates has the same effect? I see no real purpose to including this in the enable request.

dhardy avatar Nov 18 '25 14:11 dhardy

why should the cursor area be specified immediately when specifying it through updates has the same effect

To avoid the situation where the input method has no place to put the candidates window and has to start an alternative mechanism or give up. To avoid the situation where the input method UI transforms while typing because the position became available.

And no, it doesn't have the same effect. I wrote in the protocol that if it's not initially present, the input method may ignore it (that's on purpose; not having a cursor needs a pretty unusual use case, or legacy software, but I didn't want to make it outright impossible).

Taking off my protocol designer hat off, and putting my library designer hat on:

If you don't provide the cursor area in the initial request, a compliant input method may ignore it completely, providing a degraded user experience. I recommend making it difficult for API users to not provide the position from the beginning.

I mean surrounding text should be specified initially as a pre-requisite to providing updates.

That's all right then. Isn't that the case already?

As for the change cause... Actually, it might make sense to implement. It's still a protocol violation today, and future is hard to predict. Pretty much any idea about strict synchronization I can come up with needs a similar flag. If synchronization is ditched, it can just be ignored, API-wise.

dcz-self avatar Nov 18 '25 14:11 dcz-self

@dcz-self you make a good argument for the initial provision of size and position.

But, considering the protocol requires that surrounding-text be double-buffered, I believe I have a good argument that it should be possible to update position and surrounding-text independently.

I dislike how fn request_ime_update aggregates all operations (and resulting errors) into a single function.

Is there a good rationale for ImeRequestError::AlreadyEnabled (instead of silently replacing the existing IME)? I don't think it helps the backend design and I'm not convinced that it leads to better downstream applications.

Updated API suggestion (including errors, and cursor_area in InitialData):

pub mod ime {
    // Use existing ImeHint, ImePurpose, ImeSurroundingText types but drop prefix
    
    #[non_exhaustive]
    pub enum EnableError {
        NotSupported,
    }
    
    pub struct NotEnabled;

    pub struct InitialData {
        hint: Hint,
        purpose: Purpose,
        surrounding_text: Option<SurroundingText>,
        cursor_area: Option<(Position, Size)>,
    }
    
    impl InitialData {
        pub fn new(hint: Hint, purpose: Purpose) -> Self { ... }
        
        pub fn normal() -> Self {
            Self::new(Hint::default(), Purpose::Normal)
        }
        
        pub fn set_surrounding_text(&mut self, text: SurroundingText);
    }
    
    pub enum ChangeCause {
        InputMethod,
        Other,
    }
}

impl Window {
    /// Potentially fallible
    pub fn request_ime_enable(&self, data: ime::InitialData) -> Result<(), ime::EnableError>;
    
    /// Infallible
    pub fn disable_ime(&self);

    /// Updates surrounding text
    pub fn set_ime_surrounding_text(
        &self,
        text: ime::SurroundingText,
        cause: ime::ChangeCause,
    ) -> Result<(), NotEnabled>;

    /// Updates size and position
    pub fn set_ime_cursor_area(&self, pos: Position, size: Size) -> Result<(), NotEnabled>;
}

This differs from the current code on:

  • forces initial specification of hint and purpose (both of which have reasonable defaults)
  • error handling
  • ImeCapabilities (not very useful in my opinion, but I will include if you insist)
  • disaggregate ImeRequest
  • independent updates for surrounding-text and cursor-area

Taking off my protocol designer hat off, and putting my library designer hat on:

Any suggestions what IME to test with (preferably over Wayland)?

Can you think of other reasons that enabling IME might fail besides NotSupported?

dhardy avatar Nov 19 '25 08:11 dhardy

But, considering the protocol requires that surrounding-text be double-buffered, I believe I have a good argument that it should be possible to update position and surrounding-text independently.

You're correct, and that's already possible, except it's discouraged in the enabling update. Take a look here:

pub struct ImeRequestData {
    pub hint_and_purpose: Option<(ImeHint, ImePurpose)>,
    pub cursor_area: Option<(Position, Size)>,
    pub surrounding_text: Option<ImeSurroundingText>,
}

All of those are Options. The idea is that if you skip something, the previous value stays because of the double buffering. Perhaps this should be communicated better: when implementing this pattern in smithay for the umpteenth time, I started calling those structs "Update", to clearly distinguish that they are changes, and not states.

Where your API proposal seems to fall short is updating multiple properties atomically (disaggregating ImeRequest). As you mentioned, there's no need to update things together if values are double-buffered, because changing them separately leads to the same state. But the Wayland protocol is built in a way where frame-perfect updates of screen content are possible. So the text input client can bunch together related changes (change text, alter the cursor), so there's no risk that one update gets applied in this frame, and another gets delayed into another. Your API idea would still work, but it would sacrifice this property - unless I'm missing some way to make updates atomic.

Any suggestions what IME to test with (preferably over Wayland)?

Squeekboard supports the current protocol version, but it only inputs events. I created my own SUPER MESSY demo IME for ongoing development: https://codeberg.org/dcz/stiwri although I think I never implemented the protocol without extensions.

Is there a good rationale for ImeRequestError::AlreadyEnabled (instead of silently replacing the existing IME)?

I don't remember... It's possible no one gave it any thought. If you replace the IME, you have to reset the double-buffered state, so you end up sending a disable-update and an enable-update, so it's really the same. Speaking from my today's PoV, I think it's better to that discarding state is explicit and can't be done by accident.

Can you think of other reasons that enabling IME might fail besides NotSupported?

Not really. To be honest I'm not sure if NotSupported is completely necessary. It's more of a courtesy to the appliction developer I guess.

dcz-self avatar Nov 19 '25 09:11 dcz-self

All of those are Options. [..] Perhaps this should be communicated better

Yes — in particular, that fn request_ime_update aggregates all types of request behind a single method with a single error type is a poor communicator. At the very least there should be three methods: enable, disable, update.

disaggregating ImeRequest

This refers to the above, not atomic updates.

I don't remember... It's possible no one gave it any thought.

Something downstream of this API needs to track what text object an IME session is associated with, and while that can certainly disable-before-re-enabling IME, I'm not sure that forcing this will lead to fewer bugs overall.

It's not very important but does complicate the error types somewhat unnecessarily.

I'm not sure if NotSupported is completely necessary.

This is a winit API, not just Wayland. The web and Orbital backends report NotSupported.

If there are no other reasons, I wonder if the enable error should use #[non_exhaustive] or even an enum. Because downstream users can handle known errors while unknown ones essentially force a log-and-panic or log-and-ignore handler.

dhardy avatar Nov 19 '25 09:11 dhardy

At the very least there should be three methods: enable, disable, update.

No opposition from me.

I'm also easy about changes to the behaviour of enable(). I don't think it's a good idea to accept multiple enables in a row, but I already expressed my concerns. Ultimately I'm not a maintainer.

The web and Orbital backends report NotSupported.

What I meant is that there's functionally no difference between the backend not supporting an IME and no IME being actually active. So to me, having a NotSupported error is just a courtesy, not a requirement.

I wonder if the enable error should use #[non_exhaustive]

Do you expect more errors to appear in the future?

dcz-self avatar Nov 19 '25 10:11 dcz-self

Enablement is confirmed by Ime::Enabled, so enable does not need to return an error at all. There is a small benefit to this design: encourage users to react to the Ime::Enabled command and not proactively when enable_ime returns "Result::Ok".

dhardy avatar Nov 19 '25 11:11 dhardy

Squeekboard supports the current protocol version, but it only inputs events. I created my own SUPER MESSY demo IME for ongoing development: https://codeberg.org/dcz/stiwri although I think I never implemented the protocol without extensions.

Squeekboard says it requires virtual-keyboard-v1 and recommends input-method-v2. Stiwri apparently uses zwp-input-method-v2. Meanwhile winit is using zwp_text_input_v3 and zwp_text_input_manager_v3. So it looks like all of these are using incompatible protocols.

dhardy avatar Nov 22 '25 07:11 dhardy

No, this is not how it works. text_input is for client, input-method is for server. The flow is text_input <-> compositor <-> input-method-v2|virtual-keyboard-v1.

kchibisov avatar Nov 22 '25 07:11 kchibisov

I started implementing this based on my proposal above.

Adjustments:

  • Merge updates into an update_ime method.

Questions:

  • Do we want to allow hint and purpose to be modified in an update? examples/ime.rs actually does this, but I don't know if this has any real-world applications. @dcz-self mentioned that this is probably not possible on Android.
  • Should updates report NotEnabled errors or fail silently (never return an error)?
  • Is fn Window::ime_capabilities useful? To me it seems not since only the thing which enabled IME should provide updates and it should know the capabilities requested.

dhardy avatar Nov 25 '25 14:11 dhardy

I have a real-world application: for multi-purpose query fields, like the URL/search bar in the browser. The application could notify the user whether completing input is going to navigate to the URL directly by "URL" hint (the IME renders submit button as "go"), or start a search with "search" hint (the IME renders the submit button as "search").

I drafted the actions protocol under that assumption. But the same could be implemented in another, dedicated way on the protocol level.

dcz-self avatar Nov 25 '25 15:11 dcz-self