RecuperaBit
RecuperaBit copied to clipboard
tikzplot raises UnicodeDecodeError and terminates process
- The final
while True
loop should catch and ignore all exceptions (but print stack trace anyway). I know it's generally a bad idea to swallow exceptions (only if you don't print or log, and the program cannot possibly continue?). But individual commands failing should not corrupt internal state, and losing analysis data built over an hour is a terrible user experience. - Maybe you should add an option to save (eg. pickle/CPickle) the 7GB of RAM to a file, and reload it later on.
- The crash below.
- Notes: NTFS stores UTF-16 (unpaired surrogates allowed) which can be encoded as WTF-8. Python 2 has 8-bit bytes/str and arbitrary-bit unicode. For the minimum changes to your code, you could try latin1 instead of ascii.
> tikzplot 67
Traceback (most recent call last):
File "main.py", line 385, in <module>
main()
File "main.py", line 382, in main
interpret(cmd, arguments, parts, shorthands, args.outputdir)
File "main.py", line 171, in interpret
print utils.tikz_part(part)
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 310, in tikz_part
lines += [tikz_child(entry, 4)[0] for entry in (part.root, part.lost)]
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
content, number = tikz_child(entry, padding+4)
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
content, number = tikz_child(entry, padding+4)
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
content, number = tikz_child(entry, padding+4)
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
content, number = tikz_child(entry, padding+4)
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 280, in tikz_child
lines = [r'%schild {%s' % (pad, _tikz_repr(directory))]
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 273, in _tikz_repr
_ltx_clean(node.index), _ltx_clean(node.name)
File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 263, in _ltx_clean
clean = str(label).replace('$', r'\$').replace('_', r'\_')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 11: ordinal not in range(128)
Git master 18090ab9355592b623200b9e3317e1a699d6909b Python 2.7.13 (8cdda8b8cdb8ff29d9e620cccd6c5edd2f2a23ec, Apr 16 2019, 18:25:57) [PyPy 7.1.1 with GCC 8.2.0]
All your comments are spot-on and I agree with what you say. I am actually surprised that somebody else tried the TikzPlot, it was left there just because I needed some figures in my thesis. 😄
Maybe you should add an option to save (eg. pickle/CPickle) the 7GB of RAM to a file
In the perspective of a future 2.0 version, I was thinking about using a more advanced file format, rather than the current savefile which is quite poor. It could be based on a SQLite DB so the RAM usage would considerably drop as well.
Python 2 has 8-bit bytes/str and arbitrary-bit unicode
True. The fact that RecuperaBit is currently written in Python 2 provides several issues with Unicode, it should definitely be ported to Python 3.
Unfortunately, I do not have a lot of free time these days and thus I am not in a position to provide time estimates for this task.
Sigh, I guess like this was also responsible of my crash with tree
Traceback (most recent call last):
File "main.py", line 384, in <module>
main()
File "main.py", line 381, in main
interpret(cmd, arguments, parts, shorthands, args.outputdir)
File "main.py", line 118, in interpret
print utils.tree_folder(part.lost)
File "python-2_7_17_amd64\lib\codecs.py", line 369, in write
data, consumed = self.encode(object, self.errors)
File "python-_7_17_amd64\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 11295777
-11295778: character maps to <undefined>
@nyanpasu64 I was wondering if, by any chance, you could try the newly released v1.1.2 (for Python3) and see if it still gives you those errors.
@mirh some feedback from you would be great as well.
Thanks to both of you for your time.