[CRITICAL] Fix clipboard text operations and UTF-8 cursor handling
Edit: Improve structure This pull request effectively repairs the entire clipboard handling system in Nuklear. To achieve this, it addresses three separate bugs:
Note: see #840 for more details and a minimal reproducible example.
EDIT: This is fixed by later commits. Hiding this comment...
@PavelSharp Not sure, if the code is right or not, but I just wanted to note few things:
This is how pasting żółć looks like when done on master branch
(it puts 8 characters in total which seems broken)
and here is the same text, but on PR branch
(puts 4 characters as expected so we have an improvement)
So this one bug seems fixed, however, I have some problems with your PR. Whenever I try to paste the text into "edit box", it always resets the cursor (only visually). It also fails to copy and re-paste the very same text. See the video:
https://github.com/user-attachments/assets/fb46d1e9-3122-40d5-a4d4-b7daa22fdee3
Note that those issues do not appear on master, only on PR branch. So... there is either another bug, or your PR fixes and breaks something at the same time.
I'm running Nuklear/demo/sdl_renderer on Linux with Xorg
Hi, @sleeptightAnsiC!
Thank you for your comment - I really appreciate the feedback. I carefully reviewed your notes and started digging into the issue. While doing so, I actually discovered two fantastic bugs in Nuklear that (hopefully!) explain all the clipboard and UTF-8 handling problems. Let me show you what I found:
1 Cursor position issue
The problem is caused by an incorrect cursor offset in line 27194 (nk_textedit_paste).
Obviously, we want to move the cursor by the number of codepoints, not bytes.
So instead of:
state->cursor += len;
it should be:
state->cursor += glyphs;
2 Clipboard copy bug - this one is mind-blowing
Nuklear calls clip->copy only once (line 28172 in nk_do_edit), like this:
edit->clip.copy(edit->clip.userdata, text, end - begin);
The backend must provide its own implementation of this function.
But guess what? It expects the length in bytes, not in codepoints (I double-checked nk_glfw3_clipboard_copy and nk_sdl_clipboard_copy). So that’s the root cause! Luckily, it’s easy to fix.
Here’s the corrected version:
text = nk_str_at_const(&edit->string, begin, &unicode, &glyph_len);
text2 = nk_str_at_const(&edit->string, end, &unicode, &glyph_len);
if (edit->clip.copy)
edit->clip.copy(edit->clip.userdata, text, text2 - text);
After these changes, everything starts working properly, though this is still just a draft fix. Would you like to give it a try? 😇 As an optional step towards full unicode support, it will be useful to take another look at #829.
@PavelSharp Are you using LLM (like ChatGPT/Copilot) for writing your comments or/and patches ? I wasn't sure, if you did when I read some of your comments on other PR few days ago, but now I'm 99% sure you did use it on your last comment. https://github.com/Immediate-Mode-UI/Nuklear/issues/840 also feels LLM-generated. Please notice that LLMs make too many mistakes, and you should not use them without clear disclosure.
The issues that I mentioned in https://github.com/Immediate-Mode-UI/Nuklear/pull/841#issuecomment-3419762564 does not occur after https://github.com/Immediate-Mode-UI/Nuklear/pull/841/commits/3342ffca65d136406337fee255e60b71bc47fe12
But a possibility that you used LLM to fix those worries me a lot...
@sleeptightAnsiC I may use LLMs to help phrase comments more clearly, but all code and reasoning are my own. Obviously, LLMs are very good at the task of writing in beautiful English and code reviews. This technology saves time for low-responsibility tasks. Isn't that wonderful? Besides that, I always carefully review and verify any outputs before posting.
The problem is that https://github.com/Immediate-Mode-UI/Nuklear/pull/841#issuecomment-3419854697 fells like LLM read, found solution and responded, all on its own without human involved. I would rather read whatever you put into LLM than its pseudo-correct output.
This technology saves time for low-responsibility tasks. Isn't that wonderful?
It may save your time when writing this, but it wastes everyone's time when reading and validating it.
@RobLoach Please, be very careful when merging any of this.
@sleeptightAnsiC Concerns about LLM usage in this context appear to reflect a misunderstanding of how such tools are integrated into software development and debugging processes.
When working with complex code, some of us don't just ask an LLM a single question. They use it to help examine code, test ideas, and debug iteratively. Every choice is thoroughly validated.
Basically, LLMs are used as assistants for summarizing and presenting information. They do not replace human reasoning, verification, or the rigorous search for root causes.
The technical work behind these patches: analysis, testing, and verification is entirely done by me. Saying that an "LLM-generated style" implies a lack of human expertise seriously undervalues the time and professional skill dedicated to producing these issues and PRs.
Please, be very careful when merging any of this.
Such a statement is counterproductive and does not reflect the spirit of open source software development. Instead of such actions, it would be much better to focus on reviewing the proposed PR.
Something is really odd here (and I'm not talking about LLM concerns anymore, although I think my concerns are valid)
Per your comment under https://github.com/Immediate-Mode-UI/Nuklear/pull/852#discussion_r2500587803, if utf8/clipboard handling is broken everywhere (in every demo), this really feels like we have a regression here. What we should do first, is not trying to fix it, but instead we should find out why the regression has happened, what caused it and so on... Without this we may break something else, cause another regression, or run into risk that this thing will break again.
~~@PavelSharp did you try to git-bisect this issue before submitting PR/Issue ?~~
EDIT: Okey, fine... I think that the next Pavel's comment explains everything. Although, I still think this whole situation is a bit strange.
@sleeptightAnsiC So, let’s talk specifically about the clipboard copy bug. The root cause lies in these lines: https://github.com/Immediate-Mode-UI/Nuklear/blob/fcd64f85e54b9d35eee13347748d70fa4d2e134e/src/nuklear_edit.c#L284-L291
Here we clearly see that end - begin is passed as the len argument. Since b and e represent text selection positions in the edit buffer, end - begin gives us the number of Unicode codepoints selected.
However, as I said earlier, all backends treat len as the number of bytes, not the number of codepoints. And that mismatch is the core of this bug.
As for how old this issue is. It’s actually nearly 10 years old. I traced it back to the commit where version 1.0 was released. Before that point, Nuklear was still named zahnrad.
@RobLoach I think this PR is finally ready to be merged! All the requested changes have been addressed, and all merge conflicts have been fully resolved. Also, as mentioned in our Reviewers Guide (“Have at least one other person review the changes before merging”), an additional review and testing has already been completed by @sleeptightAnsiC in his comment. Thanks. With this, the clipboard behavior is now properly fixed across all existing backends
an additional review and testing has already been completed by sleeptightAnsiC in his https://github.com/Immediate-Mode-UI/Nuklear/pull/841#issuecomment-3419909743.
Wut? :flushed:
I didn't test the latest changes yet. The old comment doesn't really apply here, it's outdated.
I'll try to test this soon.
What in the ChatGPT :skull: ... I'll be happy to merge once we get the :+1:
Any nit-picks can be fixed as follow ups. Thanks again, all, for all the hard work here.
Given that this PR basically fixes something that was broken since forever, and I'm afraid a lot of people had to make some workarounds related to the bug, it might be safe to mark the fix as a "breaking change" and/or increase the major version (?)
I'm definitely NOT someone who should be reviewing this. A long-time contributor should take an action here and decide.
@rswinkle Would you mind taking a look? or maybe tag someone who you think is appropriate?
@sleeptightAnsiC
I didn't test the latest changes yet. The old comment doesn't really apply here, it's outdated.
I'm sorry, but the only real code change since that comment is just this small tweak:
return nk_str_insert_at_rune(str, pos, text, len); to return nk_str_insert_at_rune(str, pos, text, len) ? len : 0;
Everything else was just rebasing and updating the branch. Honestly, it felt like a tiny detail 🙂
I'll try to test this soon.
the only real code change since that https://github.com/Immediate-Mode-UI/Nuklear/pull/841#issuecomment-3419909743 is just this small tweak
Retested and everything seems fine.
Any nit-picks can be fixed as follow ups.
My only nitpick here is that it might be nice to have this change as one single commit (?) Those two commits don't really work correctly without each other, and the third commit is rebuilding the amalgamation, so it may confuse people. This way, if said change ever causes an issue for someone, it will be easier to find it with git-log or git-bisect. But this is just a minor inconvenience, and @RobLoach can easily solve it with "squash+merge", so no worries.
And my previous comment still applies https://github.com/Immediate-Mode-UI/Nuklear/pull/841#issuecomment-3533818168
Just to note, there have been a couple of previous attempts to address this:
- #764, as @sleeptightAnsiC pointed in his comment
- #543, where the author describes the same problem but only fixes it for the GLFW backend.
This PR resolves the root cause. Personally, I believe it's obviously that the copy callback should take the number of bytes instead of glyphs.