Phoenix icon indicating copy to clipboard operation
Phoenix copied to clipboard

RichTextCtrl incorrectly underlining and colouring text when re-writing it

Open AndyW118 opened this issue 3 years ago • 16 comments
trafficstars

Operating system Windows 10: wxPython version & source: 4.0.4, installed by PyCharm Python version & source: 3.7

Description of the problem: I am developing a control derived from RichTextCtrl which uses a spell-checker to identify spelling mistakes and highlight them by underlining them and showing them in red. The control also has to deal with URLs. When text is entered the control spell-checks words and uses the BeginUnderline/EndUnderline and BeginTextColour/EndTextColour methods to show wrongly spelled ones. It therefore is obliged to rewrite the control's text in its entirety, as it seems this is the only way to do it - it can't just 'mark' the existing text at specified positions.

The program initially uses the control to display "The wxRichTextCtrl ... generate an event." which it does properly. If the user then inserts text ("great" for example) into the control, the first letter (g) is correctly marked but when the "r" is entered the entire text apart from the URL is marked as wrongly spelled.

Code Example (click to expand)
# Put code sample here
import wx
import re
import wx.richtext as rt
from symspellpy import SymSpell, Verbosity
from symspellpy.suggest_item import SuggestItem
import pkg_resources

class _Match:
    def __init__(self, match : re.Match, isURL, suggestions=None):
        self.match = match
        self.isURL = isURL
        self.suggestions = suggestions

class RichTextControl(rt.RichTextCtrl):
    #   This is adapted from https://gist.github.com/gruber/8891611
    _REGEXP = r"(?P<URL>(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|" \
                 r"coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|" \
                 r"at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|" \
                 r"cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|" \
                 r"gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|" \
                 r"je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|" \
                 r"ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|" \
                 r"pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|" \
                 r"su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|" \
                 r"vn|vu|wf|ws|ye|yt|yu|za|zm|zw)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))" \
                 r"+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:'" \
                 r'".,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|' \
                 r'coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at' \
                 r'|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn' \
                 r'|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf' \
                 r'|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp' \
                 r'|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp' \
                 r'|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps' \
                 r'|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td' \
                 r'|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b/?(?!@))))' \
                 r'|(?P<WORD>\w+)'

    def __init__(self, *args, **kwargs):
        super(RichTextControl, self).__init__(*args, **kwargs)
        self.Bind(wx.EVT_TEXT, self.onText)
        self.regExp = re.compile(self._REGEXP)
        self.speller = SymSpell()
        path = pkg_resources.resource_filename("symspellpy", "frequency_dictionary_en_82_765.txt")
        print('Loading dictionary ...')
        self.speller.load_dictionary(path, 0, 1)
        self.urlStyle = rt.RichTextAttr()
        self.urlStyle.SetTextColour(wx.Colour(33, 94, 161))

    def onText(self, evt : wx.CommandEvent):
        self.ProcessText()

    def ProcessText(self):
        text = self.GetValue()
        m : re.Match
        matches = list(self.regExp.finditer(text))
        workList = list()
        for m in matches:
            word = m.group('WORD')
            if word is not None:
                result = self.speller.lookup(word, Verbosity.TOP)
                if result != word:
                    if len(result) == 1 and result[0].term.lower() == m.group().lower():
                        pass
                    elif len(result) > 0:  # mistake with suggestions
                        workList.append(_Match(m, False, result))
                    else:  # mistake, but no suggestions
                        workList.append(_Match(m, False, None))
            else:
                workList.append(_Match(m, True))
        if len(workList) > 0:
            self.ReplaceText(text, workList)

    def ReplaceText(self, originalText, workList):
        self.Freeze()
        csrPos = self.GetCaretPosition()
        self.Clear()
        pos = 0
        item : _Match
        for item in workList:
            self.WriteText(originalText[pos:item.match.start()])   #   append text preceding each group
            if item.isURL:
                self.BeginStyle(self.urlStyle)
                self.BeginURL(item.match.group())
                self.WriteText(item.match.group())
                self.EndStyle()
                self.EndURL()
            else:
                self.BeginUnderline()
                self.BeginTextColour(wx.Colour('red'))
                self.WriteText(item.match.group())
                self.EndTextColour()
                self.EndUnderline()
            pos = item.match.end()
        #   Append trailing text
        self.WriteText(originalText[item.match.end():])
        self.SetCaretPosition(csrPos)
        self.Thaw()
        pass

class RichTextFrame(wx.Frame):
    def __init__(self, *args, **kw):
        wx.Frame.__init__(self, *args, **kw)
        self.rtc = RichTextControl(self, style=wx.VSCROLL | wx.HSCROLL | wx.NO_BORDER)
        self.rtc.Bind(wx.EVT_TEXT_URL, self.OnURL)

        urlStyle = rt.RichTextAttr()
        urlStyle.SetTextColour(wx.Colour(33, 94, 161))

        self.rtc.WriteText("The wxRichTextCtrl can also display URLs, such as this one: ")
        self.rtc.BeginStyle(urlStyle)
        self.rtc.BeginURL("http://www.wxwidgets.org")
        self.rtc.WriteText("http://www.wxwidgets.org")
        self.rtc.EndURL()
        self.rtc.EndStyle()
        self.rtc.WriteText(". Click on the URL to generate an event.")

    def OnURL(self, evt):
        wx.MessageBox(evt.GetString(), "URL Clicked")


class TestPanel(wx.Panel):
    def __init__(self, parent):
        wx.Panel.__init__(self, parent, -1)
        win = RichTextFrame(self, -1, "Rich-text frame", size=(700, 500), style = wx.DEFAULT_FRAME_STYLE)
        win.Show(True)

if __name__ == '__main__':
    app = wx.App(0)
    frame = wx.Frame(None)
    panel = TestPanel(frame)
    app.MainLoop()```
</details>
![Screenshot 2022-09-08 182321](https://user-images.githubusercontent.com/46823317/189186448-5752b100-2cdc-4773-838b-87998a494ff8.jpg)

AndyW118 avatar Sep 08 '22 17:09 AndyW118

Can you please make a simpler reproducer (ie, a SCCCE)?

Also, can you try 4.2.0 - 4.0.4 is quite old.

swt2c avatar Sep 08 '22 17:09 swt2c

Thanks for replying. I have tried to install the latest version as follows: I ran pip install wxPython==4.2.0 within PyCharm's Terminal window when in the project. This worked without errors. I then told PyCharm to remove wxPython 4.0.4 and then added 4.2, which was now visible as the latest version. Unfortunately this failed, being unable to file a module called attrDict (see screenshot). I can't find this file anywhere on my machine. Screenshot 2022-09-08 191121

AndyW118 avatar Sep 08 '22 18:09 AndyW118

I'll try to think of a way to provide a simpler reproducer ....

AndyW118 avatar Sep 08 '22 18:09 AndyW118

Try pip install attrdict3 before installing wxPython 4.2.0.

swt2c avatar Sep 08 '22 19:09 swt2c

I now get this error installing wxPython after installing attrdict3 Screenshot 2022-09-08 210957

I will remove code to do with URLs. Will that be minimal enough?

AndyW118 avatar Sep 08 '22 20:09 AndyW118

I started a new PyCharm project and wxPython 4.2 installed successfully in it. The problem is still there though.

AndyW118 avatar Sep 08 '22 20:09 AndyW118

Here is a simpler version which shows the problem. Start the program and then type "XX" (the only 'spelling mistake' the program recognises) anywhere in the string that appears. The XX is correctly shown in red and underlined. Type one more character (anything) and the entire line is wrongly shown in red and underlined.

`import wx import re import wx.richtext as rt

class RichTextControl(rt.RichTextCtrl): _REGEXP = r'(?P<WORD>\w+)'

def __init__(self, *args, **kwargs):
    super(RichTextControl, self).__init__(*args, **kwargs)
    self.Bind(wx.EVT_TEXT, self.onText)
    self.regExp = re.compile(self._REGEXP)
    self.urlStyle = rt.RichTextAttr()
    self.urlStyle.SetTextColour(wx.Colour(33, 94, 161))

def onText(self, evt : wx.CommandEvent):
    self.ProcessText()

def ProcessText(self):
    text : str = self.GetValue()
    m : re.Match
    matches = list(self.regExp.finditer(text))
    workList = list()
    for m in matches:
        word = m.group('WORD')
        if word is not None:
            if m.group() == 'XX':
                workList.append(m)
    if len(workList) > 0:
        self.ReplaceText(text, workList)

def ReplaceText(self, originalText, workList):
    self.Freeze()
    csrPos = self.GetCaretPosition()
    self.Clear()
    pos = 0
    item : re.Match
    for item in workList:
        self.WriteText(originalText[pos:item.start()])   #   append text preceding each group
        self.BeginUnderline()
        self.BeginTextColour(wx.Colour('red'))
        self.WriteText(item.group())
        self.EndTextColour()
        self.EndUnderline()
        pos = item.end()
    #   Append trailing text
    self.WriteText(originalText[item.end():])
    self.SetCaretPosition(csrPos)
    self.Thaw()

class RichTextFrame(wx.Frame): def init(self, *args, **kw): wx.Frame.init(self, *args, **kw) self.rtc = RichTextControl(self, style=wx.VSCROLL | wx.HSCROLL | wx.NO_BORDER) self.rtc.Bind(wx.EVT_TEXT_URL, self.OnURL) self.rtc.WriteText("The wxRichTextCtrl can also display URLs.")

def OnURL(self, evt):
    wx.MessageBox(evt.GetString(), "URL Clicked")

class TestPanel(wx.Panel): def init(self, parent): wx.Panel.init(self, parent, -1) win = RichTextFrame(self, -1, "Rich-text frame", size=(700, 500), style = wx.DEFAULT_FRAME_STYLE) win.Show(True)

if name == 'main': app = wx.App(0) frame = wx.Frame(None) panel = TestPanel(frame) app.MainLoop()`

AndyW118 avatar Sep 12 '22 09:09 AndyW118

Instead of using BeginUnderline(), BeginTextColour() and re-entering the text etc, why not create a RichTextAttr to define the style and then use SetStyle() to apply the style to the appropriate range of characters in the control?

reticulatus avatar Sep 12 '22 13:09 reticulatus

Fair point. But the control will eventually have to rewrite the text because that is (AFAIK) the only way to have it correct spelling mistakes it finds - because it can't replace a range of characters, I believe.

AndyW118 avatar Sep 12 '22 14:09 AndyW118

The Replace() method will do that.

Replace(self, from_, to_, value)

Replaces the content in the specified range with the string specified by value.

Parameters

        from_ (long) –

        to_ (long) –

        value (string) –

reticulatus avatar Sep 12 '22 14:09 reticulatus

Yes, I have just noticed it!! I will try your suggestion - thanks. However, I do think there is a bug: my orignal method should work too.

AndyW118 avatar Sep 12 '22 14:09 AndyW118

I tried to run your example code. I think I have fixed the formatting issues caused by pasting to this list.

However, I'm getting an error where it tries to compile the regex:

re.error: unknown extension ?P\w at position 1

Is the statement _REGEXP = r'(?P\w+)' correct?

reticulatus avatar Sep 12 '22 15:09 reticulatus

No, I'm sorry. The code-quoting (ctrl+e) is screwing up things and I hadn't noticed. I've attached the code, which actually works! So all I have to do now is figure out why my original version (rejected as too complicated) does not! I'm attaching the correct code as I can't get it inline correctly. richTextURLdetection.zip

AndyW118 avatar Sep 12 '22 15:09 AndyW118

Thanks for uploading the code.

I think this is a feature of the RichTextCtrl. You can see it doing the same thing in the example in the wxPython demo. If you put the cursor at the end of any of the sections that has a style applied, the same style is applied to any new text you enter from there. This includes any spaces (on which the style is invisible).

What I do in my app is to clear all the styles whenever the text is changed and regenerate them based on the content.

reticulatus avatar Sep 12 '22 16:09 reticulatus

Thanks for all your help. I'm working on using styles now ...

AndyW118 avatar Sep 12 '22 16:09 AndyW118

Happy to help. If you wish to discuss anything else about the RTC, or other wxPython topics I would recommend posting a question at https://discuss.wxpython.org/

reticulatus avatar Sep 12 '22 16:09 reticulatus