llvmlite icon indicating copy to clipboard operation
llvmlite copied to clipboard

There are any reason to use 'latin1' for encoding and decoding?

Open dbwodlf3 opened this issue 4 years ago • 5 comments

https://github.com/numba/llvmlite/blob/master/llvmlite/binding/common.py

In using llvmlite.binding.parse_assembly, There are some korean comments in ll file. So can't encode and decode.

If there aren't special reason to use 'latin1', i think it is better to use 'utf-8'.

Thanks.

dbwodlf3 avatar Nov 19 '20 07:11 dbwodlf3

UTF-8 was originally used, but was switched to latin1 in https://github.com/numba/llvmlite/pull/53. If I switch the encoding to UTF-8, the test case that accompanied that PR doesn't fail (on Linux at least, will try shortly on Windows). I'm not sure the reason for the switch to latin1 is still valid, so will post more back here after further investigation.

edit: corrected "latin8" to "latin1"

gmarkall avatar Nov 19 '20 10:11 gmarkall

Update: The test also passes on Windows.

gmarkall avatar Nov 19 '20 12:11 gmarkall

I've opened https://github.com/numba/llvmlite/pull/655 with the change to see what happens on CI.

gmarkall avatar Nov 19 '20 12:11 gmarkall

The CI tests seem to pass with utf-8, so I've pinged @sklam to ask for a further opinion.

gmarkall avatar Nov 19 '20 12:11 gmarkall

The linked PR #655 is now ready for review.

gmarkall avatar Nov 25 '20 15:11 gmarkall