unformat
unformat copied to clipboard
Issues with UTF-8 characters when run in BashOnWindows
I'm getting the following error:
File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/__main__.py", line 85, in <module>
main(args, pool)
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/__main__.py", line 52, in main
(generation_fittest, population) = generate(population, source_filenames, args, pool)
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/__main__.py", line 39, in generate
scored_population = score_population(population, source_filenames, args, pool)
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/__main__.py", line 35, in score_population
return pool.map(task, population)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 768: ordinal not in range(128)
Unfortunately, I don't know on which file or code point is causing the issue, since there's no mention of the filename.
It might be a configuration error on my Windows Subsystem for Linux, since locale seems to be set incorrectly:
$ locale
LANGUAGE =
LC_ALL =
LANG = "en_US.UTF-8"
However, it doesn't help to explicitly set the locale.
See Microsoft/BashOnWindows#1544 for a possible explanation.
I've removed the reference to ASCII and replaced it with UTF-8. That should fix it but I'm running Debian/Bash so it's tricky for me to confirm.
Fixed in master as of #12
This did indeed fix the original issue. However, I'm getting a different error now:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/__main__.py", line 30, in __call__
return (measure(config, self._source_filenames, self._args), config)
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/measure.py", line 37, in measure
scores = [measure_file(source_filename, workspace_path, args.command) for source_filename in source_filenames]
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/measure.py", line 37, in <listcomp>
scores = [measure_file(source_filename, workspace_path, args.command) for source_filename in source_filenames]
File "/mnt/c/Users/sqrt/Downloads/clang-unformat/measure.py", line 19, in measure_file
source = source_file.read()
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 16: invalid start byte
I've tried playing around with codecs.open()
and or open(..., encoding='utf-8')
, but with no success.
Maybe it's a limitation by Bash on Windows, and there's not much you can do?
One thing I can do is improve error reporting to display the name of the file which is being decoded. If you have any idea which file it is and can send it to me, that would also help. It might not be a Windows-only issue.
I've extracted the failing file. It's ftd2xx.h from the FTDI driver API (see https://gist.github.com/sqrt/7db06562b157ad33316d3921bde9e902). Seems like the culprit is the "©" character.
As far as I can tell, this is a UTF-8 character in a UTF-8 file. I can open it from multiple GUI and CL tools under Ubuntu 16.04 without trouble and am able to pass it into clang-format
too. You might have a different clang-format
version which might produce different results or you might have ended up with a .clang-format
file which is especially good at making something fail.
I'm going to try and shore up the error output in future changes including printing the filename in error messages and providing a --debug
option which dumps diagnostic information. But cannot think what else to do to fix this easily. I'll leave the issue open in case I ever set up a Python environment on my Windows partition. Perhaps you could try loading and saving the file to see if it changes anything.