Phoenix
Phoenix copied to clipboard
4-byte characters (emojis) in StaticText/TextCtrl/Button/ListCtrl/... labels or values cause string truncation
Operating system: Windows 11 23H2 wxPython version & source: pypi 4.2.1
>>> import wx
>>> print(wx.PlatformInfo)
('__WXMSW__', 'wxMSW', 'unicode', 'unicode-wchar', 'wx-assertions-on', 'phoenix', 'wxWidgets 3.2.2.1', 'autoidman', 'sip-6.7.9', 'build-type: release')
Python version & source: stock
python -VV
Python 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]
Description of the problem: When value/label strings on controls contain emojis (4 byte characters), the string is truncated 1 byte too early for every such character in the string. This means that for every such character (e.g. emoji) in the string, a dummy character needs to be added to the end for the string not to be truncated.
Example: having labels with the following strings: "Test 🐛" "Test 🐛" "Test 🐛🐛" "Test 🐛🐛__" "🐛🐛🐛 This should count to 5: 12345"
results in the following:
This happens at least for StaticText, TextCtrl, Button and ListCtrl. I haven't tried any others.
Code Example (click to expand)
import wx
class TestFrame(wx.Frame):
def __init__(self, parent):
wx.Frame.__init__(
self, parent, id=wx.ID_ANY, title="Test", pos=wx.DefaultPosition, size=wx.Size(500, 300), style=wx.DEFAULT_FRAME_STYLE | wx.TAB_TRAVERSAL
)
sizer = wx.BoxSizer(wx.VERTICAL)
text1 = wx.StaticText(self, wx.ID_ANY, u"Test 🐛", wx.DefaultPosition, wx.DefaultSize, 0)
sizer.Add(text1, 0, wx.ALL | wx.EXPAND, 5)
text2 = wx.StaticText(self, wx.ID_ANY, u"Test 🐛_", wx.DefaultPosition, wx.DefaultSize, 0)
sizer.Add(text2, 0, wx.ALL | wx.EXPAND, 5)
text3 = wx.StaticText(self, wx.ID_ANY, u"Test 🐛🐛_", wx.DefaultPosition, wx.DefaultSize, 0)
sizer.Add(text3, 0, wx.ALL | wx.EXPAND, 5)
text4 = wx.StaticText(self, wx.ID_ANY, u"Test 🐛🐛__", wx.DefaultPosition, wx.DefaultSize, 0)
sizer.Add(text4, 0, wx.ALL | wx.EXPAND, 5)
text5 = wx.StaticText(self, wx.ID_ANY, u"🐛🐛🐛 This should count to 5: 12345", wx.DefaultPosition, wx.DefaultSize, 0)
sizer.Add(text5, 0, wx.ALL | wx.EXPAND, 5)
self.SetSizer(sizer)
self.Layout()
if __name__ == '__main__':
app = wx.App()
frm = TestFrame(None)
frm.Show()
app.MainLoop()
This seems to work fine when using � (U+FFFD). Which is the last displayable character which encodes to three UTF-8 codepoints, the last displayable character to encode to one UTF-16 codepoint, and the last displayable character on the BMP:
It breaks when using 𐀁 (U+10001). Which is the first character which encodes to four UTF-8 codepoints, and to two UTF-16 codepoints, and the first character not on the BMP:
(The font doesn't have glyph for this character, but that shouldn't matter for this issue).
I believe this is a duplicate of #2446 (fixed in git). Please test the latest snapshots.