bikeshed icon indicating copy to clipboard operation
bikeshed copied to clipboard

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Open gabrielsanbrito opened this issue 2 months ago • 4 comments

This happens on Windows. I noticed that a Bikeshed template created with bikeshed template > index.bs apparently has the "UTF-16 LE" encoding. This results in the following parsing errors when calling bikeshed spec index.bs index.html:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "c:\users\<user>\.local\bin\bikeshed.exe\__main__.py", line 6, in <module>
  File "C:\Users\<user>\pipx\venvs\bikeshed\Lib\site-packages\bikeshed\cli.py", line 480, in main
    handleSpec(options, extras)
  File "C:\Users\<user>\pipx\venvs\bikeshed\Lib\site-packages\bikeshed\cli.py", line 527, in handleSpec
    doc = Spec(
          ^^^^^
  File "C:\Users\<user>\pipx\venvs\bikeshed\Lib\site-packages\bikeshed\Spec.py", line 87, in __init__
    self.valid = self.initializeState()
                 ^^^^^^^^^^^^^^^^^^^^^^
    self.valid = self.initializeState()
ckages\bikeshed\Spec.py", line 125, in initializeState
ckages\bikeshed\Spec.py", line 125, in initializeState
    self.inputContent = self.inputSource.read()
    self.inputContent = self.inputSource.read()
ckages\bikeshed\InputSource.py", line 205, in read
    f.readlines(),
    ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

To fix it I need to manually make VSCode change the file encoding to UTF-8. Is this expected?

gabrielsanbrito avatar Oct 23 '25 02:10 gabrielsanbrito

Ooh, that's definitely not intentional! All my file operations explicitly encode to utf-8, but template just prints to the console, and it looks like the Windows console natively speaks UTF-16LE. However, I'm also seeing that PEP 528 should have changed Python's console behavior to always default to UTF-8, so I'm a little confused. (I've also never done Bikeshed on Windows, just WSL where it's a Linux instead.)

I've just committed what I hope might be a fix; I'll cut a release after it passes tests, and you can see if it works better for you.

tabatkins avatar Oct 23 '25 22:10 tabatkins

Argh, sorry for the delay in cutting a release; 5.3.5 has been uploaded now with this attempted fix. I don't have a Windows install that I can easily test on, so hopefully this suffices; if not, I'm gonna have to keep tweaking.

tabatkins avatar Nov 04 '25 21:11 tabatkins

It still didn't work :( But I was looking it up a bit on the web (here) and maybe this could an issue with how the Windows terminal works? Maybe we could add an extra -o option to bikeshed template to output the template to a file instead of the stdout? Then we can create an UTF-8 file instead. This seem be more portable/time-proof. WDYT?

gabrielsanbrito avatar Nov 24 '25 22:11 gabrielsanbrito

Yeah, that was gonna be my fallback solution, just annoyed that it looks like this should be fixed and it's not. I'll do the output file thing.

tabatkins avatar Nov 24 '25 23:11 tabatkins

Okay, Bikeshed 7.0.5 takes an outfile argument to bikeshed template. (You need to specify which template you're using, then: bikeshed template spec myfile.bs.)

HOPEFULLY printing directly works fine, since normal Bikeshed output apparently does. Please lmk!

tabatkins avatar Dec 15 '25 22:12 tabatkins