RecuperaBit icon indicating copy to clipboard operation
RecuperaBit copied to clipboard

tikzplot raises UnicodeDecodeError and terminates process

Open nyanpasu64 opened this issue 5 years ago • 3 comments

  • The final while True loop should catch and ignore all exceptions (but print stack trace anyway). I know it's generally a bad idea to swallow exceptions (only if you don't print or log, and the program cannot possibly continue?). But individual commands failing should not corrupt internal state, and losing analysis data built over an hour is a terrible user experience.
  • Maybe you should add an option to save (eg. pickle/CPickle) the 7GB of RAM to a file, and reload it later on.
  • The crash below.
    • Notes: NTFS stores UTF-16 (unpaired surrogates allowed) which can be encoded as WTF-8. Python 2 has 8-bit bytes/str and arbitrary-bit unicode. For the minimum changes to your code, you could try latin1 instead of ascii.
> tikzplot 67
Traceback (most recent call last):
  File "main.py", line 385, in <module>
    main()
  File "main.py", line 382, in main
    interpret(cmd, arguments, parts, shorthands, args.outputdir)
  File "main.py", line 171, in interpret
    print utils.tikz_part(part)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 310, in tikz_part
    lines += [tikz_child(entry, 4)[0] for entry in (part.root, part.lost)]
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 280, in tikz_child
    lines = [r'%schild {%s' % (pad, _tikz_repr(directory))]
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 273, in _tikz_repr
    _ltx_clean(node.index), _ltx_clean(node.name)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 263, in _ltx_clean
    clean = str(label).replace('$', r'\$').replace('_', r'\_')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 11: ordinal not in range(128)

Git master 18090ab9355592b623200b9e3317e1a699d6909b Python 2.7.13 (8cdda8b8cdb8ff29d9e620cccd6c5edd2f2a23ec, Apr 16 2019, 18:25:57) [PyPy 7.1.1 with GCC 8.2.0]

nyanpasu64 avatar Jun 10 '19 05:06 nyanpasu64

All your comments are spot-on and I agree with what you say. I am actually surprised that somebody else tried the TikzPlot, it was left there just because I needed some figures in my thesis. 😄

Maybe you should add an option to save (eg. pickle/CPickle) the 7GB of RAM to a file

In the perspective of a future 2.0 version, I was thinking about using a more advanced file format, rather than the current savefile which is quite poor. It could be based on a SQLite DB so the RAM usage would considerably drop as well.

Python 2 has 8-bit bytes/str and arbitrary-bit unicode

True. The fact that RecuperaBit is currently written in Python 2 provides several issues with Unicode, it should definitely be ported to Python 3.

Unfortunately, I do not have a lot of free time these days and thus I am not in a position to provide time estimates for this task.

Lazza avatar Jun 24 '19 10:06 Lazza

Sigh, I guess like this was also responsible of my crash with tree

Traceback (most recent call last):
  File "main.py", line 384, in <module>
    main()
  File "main.py", line 381, in main
    interpret(cmd, arguments, parts, shorthands, args.outputdir)
  File "main.py", line 118, in interpret
    print utils.tree_folder(part.lost)
  File "python-2_7_17_amd64\lib\codecs.py", line 369, in write
    data, consumed = self.encode(object, self.errors)
  File "python-_7_17_amd64\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 11295777
-11295778: character maps to <undefined>

mirh avatar Feb 04 '20 00:02 mirh

@nyanpasu64 I was wondering if, by any chance, you could try the newly released v1.1.2 (for Python3) and see if it still gives you those errors.

@mirh some feedback from you would be great as well.

Thanks to both of you for your time.

Lazza avatar Jan 02 '21 17:01 Lazza