Copy-paste of UTF-8 text containing the null byte does not work.
Description of the bug
Copy-pasted text containing the null byte enters odd ? characters.
Steps to reproduce
- Select some text that contains the null byte:
-
Copy with
Ctrl-C. -
Paste with
Ctrl-V.
Expected behavior
Pasting the text should paste the copied null bytes as well.
Actual behavior
Pasted null bytes are replaced with ? characters:
Sublime Text build number
4200
Operating system & version
Windows 10
(Linux) Desktop environment and/or window manager
No response
Additional information
No response
OpenGL context information
Lucky you. Text is truncated at first NULL byte on Linux.
Appears to be a re-appearing regression, which was fixed before by https://github.com/sublimehq/sublime_text/issues/5443 - or at least falls into same category.
When I tested this first (I'm on Windows), in Sublime Text version 41?? (I don't remember the last two chars), I did first see that the paste got truncated at the null bytes.
Then I updated to the latest version 4200 and the truncation changed to pasting the diamond ? chars instead.
Those characters indicate encoding/decoding issues. To be fair, Windows Notepad fails on it as well, but in other ways. It just replaces NUL bytes with 0x20.
There are probably reasons to work with base64 encoded representations of such data.
Duplicate of #5443. See https://github.com/sublimehq/sublime_text/issues/5443#issuecomment-2244266207 for an explanation of why replacing null characters is required.
The fact that Windows has a limitation does not mean that there wouldn't exist a number of ways to fix this, like was already mentioned in the linked bug.
And you can see from the comment I linked an explanation as to why such workarounds do not work. If you have another workaround I'm all ears.
And you can see from the comment I linked an explanation as to why such workarounds do not work.
After your comment, the next comment https://github.com/sublimehq/sublime_text/issues/5443#issuecomment-2244279669 mentions
"Well, an alternative solution could be using some obscure Unicode codepoint as a substitution."
that was not addressed in the thread.
For example, the NULL character could be replaced with code U+10FFFD from the Unicode Private Use Area, which is intended for users to form names and other words using characters that aren't available in standard screen and printer fonts.
This would fix the copy-paste inside Sublime Text use case. Copy-pasting from/to Sublime Text to another app would not work - and that is not possible to make work.. but being able to copy-paste inside Sublime Text is the big part.
A second solution for solving this would be to use a shadow copy-paste buffer:
-
When copying with Ctrl-C: remember the "Sublime Text view" of the copy buffer in the Sublime Text app, call this
originalPasteBuffer, and make a second copy of the buffer, with all the00hreplaced by\x00, i.e.5Ch 78h 30h 30h, if current text file encoding is UTF-8. Send this second copy,escapedPasteBuffer, to the system clipboard. -
Then pasting out to another application will receive
\x00instead of00h, which would improve interchangeability with external apps when pasting Unicode text e.g. in UTF-8 encoded program code. (say, JavaScript or HTML). -
Then when pasting inside Sublime Text,
memcmp()the to-be pasted memory block againstescapedPasteBuffer. If it matches, then pasteoriginalPasteBufferinstead.
This way nulls could be copied inside Sublime Text, and from Sublime Text out to another app.
I've reopened the issue since you're asking for this to work within ST. #5443 was asking for it to work with other applications, which as stated is impossible.