vint
vint copied to clipboard
Fix UnicodeDecodeError
Original code does not take into account scriptencoding is comment or not. So UnicodeDecodeError occures in the code
" scriptencoding とは
Traceback (most recent call last):
File "/Users/tmsanrinsha/python/bin/vint", line 11, in <module>
sys.exit(main())
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/__init__.py", line 11, in main
init_cli()
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/bootstrap.py", line 22, in init_cli
cli.start()
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/linting/cli.py", line 27, in start
violations = self._lint_all(env, config_dict)
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/linting/cli.py", line 120, in _lint_all
violations += linter.lint_file(file_path)
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/linting/linter.py", line 107, in lint_file
root_ast = self._parser.parse_file(path)
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/ast/parsing.py", line 37, in parse_file
decoded = decoder.read(file_path)
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/encodings/decoder.py", line 30, in read
string = self.strategy.decode(hunk, debug_hint=debug_hint_for_the_loc)
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/encodings/decoding_strategy.py", line 45, in decode
string_candidate = strategy.decode(bytes_seq, debug_hint)
File "/Users/tmsanrinsha/python/lib/python/site-packages/vint/encodings/decoding_strategy.py", line 77, in decode
return bytes_seq.decode(encoding=encoding_part.decode(encoding='ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
This PR fixes the problem.
Sample output:
#!/usr/bin/env python
import re
def _split_by_scriptencoding(bytes_seq):
# type: (bytes) -> [(str, bytes)]
max_end_index = len(bytes_seq)
start_index = 0
bytes_seq_and_loc_list = []
for m in re.finditer(b'^\s*(scriptencoding)', bytes_seq, re.MULTILINE):
end_index = m.start(1)
if end_index == 0:
continue
bytes_seq_and_loc_list.append((
"{start_index}:{end_index}".format(start_index=start_index, end_index=end_index),
bytes_seq[start_index:end_index]
))
start_index = end_index
bytes_seq_and_loc_list.append((
"{start_index}:{end_index}".format(start_index=start_index, end_index=max_end_index),
bytes_seq[start_index:max_end_index]
))
return bytes_seq_and_loc_list
str = '''scriptencoding utf-8
" scriptencoding あ
echo 'scriptencoding い'
scriptencoding utf-8
'''
print(_split_by_scriptencoding(str.encode()))
output
[('0:69', b'scriptencoding utf-8\n" scriptencoding \xe3\x81\x82\necho \'scriptencoding \xe3\x81\x84\'\n '), ('69:90', b'scriptencoding utf-8\n')]
Sorry for my too late reply.
We should support the following abnormal situation if we can:
:::::
\scriptencoding utf8
How do you feel about it?
@tmsanrinsha Please reply to the last comment from @Kuniwak / provide an update.
Also a test would be needed.
@tmsanrinsha Ping. I'd like to do a new release soonish, and it would be great to have this included.