nvpy icon indicating copy to clipboard operation
nvpy copied to clipboard

fix crash on unicode U+FFFD character

Open DivineDominion opened this issue 12 years ago • 5 comments

Some files I worked with have broken encoding. The UTF-8 'replacement character' U+FFFD may appear in filenames or note text bodies where umlauts etc. were before.

nvPY crashes when these files are added. A test string:

"Vorl�ufigkeit von Wissen bei Weber"

DivineDominion avatar Oct 26 '12 13:10 DivineDominion

Are you using the print_columns mode? For that options i can reproduce the error, or crashes it also in other situations?

swestdijk avatar Oct 26 '12 18:10 swestdijk

It happens with nvPY "out of the box" config. Which mode would I have to chose to see if it doesn't crash in other circumstances?

DivineDominion avatar Nov 01 '12 10:11 DivineDominion

Could you start nvpy from the command line, make it crash, and then paste the traceback here?

cpbotha avatar Nov 04 '12 21:11 cpbotha

I downloaded the nvpy git archive 30min ago and got this:

python.exe nvpy-master\nvpy\nvpy.py
Traceback (most recent call last):
  File "nvpy-master\nvpy\nvpy.py", line 696, in <module>
    main()
  File "nvpy-master\nvpy\nvpy.py", line 691, in main
    controller = Controller()
  File "nvpy-master\nvpy\nvpy.py", line 242, in __init__
    self.notes_db = NotesDB(self.config)
  File "C:\nvpy-master\nvpy\notes_db.py", line 108, in __init__
    c = f.read()
  File "C:\Python27\lib\codecs.py", line 671, in read
    return self.reader.read(size)
  File "C:\Python27\lib\codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 48: invalid
start byte

I wrote a ruby script to replace characters in 'broken' files. It works with these converted files.

DivineDominion avatar Nov 05 '12 16:11 DivineDominion

I can confirm that setting print_columns = 0 in my ~/.nvpy.cfg (where I have also have layout = vertical) solves a crash on first sync for me. The log file was full of lots of other exceptions, TypeErrors and _tkinter.TclErrors, but in my case no UnicodeDecode errors. Just… weird.

I have almost a thousand notes, so it would've been difficult to pinpoint which one (if any of them) was causing the problem, but I do use Unicode (including German alphabet) characters extensively in my notes, which is what led me to this issue.

@swestdijk Out of curiosity, how on earth did it occur to you that print_columns had anything to do with that? I don't see any other issues related to that setting, and yet, yep, that was exactly what caused the crash for me.

ernstki avatar Jan 16 '20 01:01 ernstki