toml-cli icon indicating copy to clipboard operation
toml-cli copied to clipboard

Issue with encoding of file in windows

Open aarmn opened this issue 11 months ago • 4 comments

I have a pyproject.toml file and I wanted to use this tool to extract some info into a pipeline in terminal, but for some absurd reason, the formatting in windows is fixated on "cp1252" with no way to change, and it doesn't seem to check the system for actual format of the file, here is an example of the issue (and it only happened after I added a 🖼️emoji to the file)

expected output:

> uvx --from toml-cli toml get --toml-path pyproject.toml project.version
0.2.0

current output:

> uvx --from toml-cli toml get --toml-path pyproject.toml project.version
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ C:\Users\a\AppData\Local\uv\cache\archive-v0\cWbLwjkBJn0V9fMPPzLOx\Lib\s │
│ ite-packages\toml_cli\__init__.py:20 in get                                  │
│                                                                              │
│    17 │   default: Optional[str] = typer.Option(None),                       │
│    18 ):                                                                     │
│    19 │   """Get a value from a toml file"""                                 │
│ ❱  20 │   toml_part = tomlkit.parse(toml_path.read_text())                   │
│    21 │                                                                      │
│    22 │   if key is not None:                                                │
│    23 │   │   for key_part in key.split("."):                                │
│                                                                              │
│ ╭───────────────── locals ──────────────────╮                                │
│ │   default = None                          │                                │
│ │       key = 'project.version'             │                                │
│ │ toml_path = WindowsPath('pyproject.toml') │                                │
│ ╰───────────────────────────────────────────╯                                │
│                                                                              │
│ C:\Program Files\Python312\Lib\pathlib.py:1028 in read_text                  │
│                                                                              │
│   1025 │   │   """                                                           │
│   1026 │   │   encoding = io.text_encoding(encoding)                         │
│   1027 │   │   with self.open(mode='r', encoding=encoding, errors=errors) as │
│ ❱ 1028 │   │   │   return f.read()                                           │
│   1029 │                                                                     │
│   1030 │   def write_bytes(self, data):                                      │
│   1031 │   │   """                                                           │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ encoding = 'locale'                                                      │ │
│ │   errors = None                                                          │ │
│ │        f = <_io.TextIOWrapper name='pyproject.toml' mode='r'             │ │
│ │            encoding='cp1252'>                                            │ │
│ │     self = WindowsPath('pyproject.toml')                                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ C:\Program Files\Python312\Lib\encodings\cp1252.py:23 in decode              │
│                                                                              │
│    20                                                                        │
│    21 class IncrementalDecoder(codecs.IncrementalDecoder):                   │
│    22 │   def decode(self, input, final=False):                              │
│ ❱  23 │   │   return codecs.charmap_decode(input,self.errors,decoding_table) │
│    24                                                                        │
│    25 class StreamWriter(Codec,codecs.StreamWriter):                         │
│    26 │   pass                                                               │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ final = True                                                             │ │
│ │ input = b'# Schema:                                                      │ │
│ │         https://json.schemastore.org/pyproject.json\n\n[project]\nname = │ │
│ │         "pixelis'+1714                                                   │ │
│ │  self = <encodings.cp1252.IncrementalDecoder object at                   │ │
│ │         0x000001CB62947860>                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 122:
character maps to <undefined>

aarmn avatar Jan 29 '25 03:01 aarmn

I see name = "pixelis'+1714 in the stacktrace, which is probably not a correct encoding. Can you share the pyproject.toml?

mrijken avatar Jan 29 '25 06:01 mrijken

Sure, also I looked up a bit deeper into it and found out its probably an issue with vscode saving it, or, an issue with pathlib, as you delegate the task to pathlib, interestingly, chardet mistook it for that weird windows encoding instead of utf-8 as well, which says smth doesn't match up

https://limewire.com/d/439763fb-cf02-4dc5-92d5-8de558e069ca#30oM9jFu4pu8YgufHcLF4__bl6QaSS69UINIfrJZ2f4 (I found this random upload center, as saving to pastebin and other stuff might messed with files original encoding, if link didn't work, inform me)

aarmn avatar Jan 29 '25 17:01 aarmn

I could reproduce it. However, if I use another unicode emoticon, it wotrks as expected. Maybe it is a Windows encoding issue.

mrijken avatar Jan 29 '25 19:01 mrijken

I could reproduce it. However, if I use another unicode emoticon, it wotrks as expected. Maybe it is a Windows encoding issue.

It might be, I'm truly lost in what might it be, I assume the best way to find out, is to check, what makes a file utf-8 (based on standard def what are indicator), and if its present, or maybe, what makes a file a cp1252, and try to check, is that present in this file (while it shouldn't be) but just to fix this in short term, a hacky, if it didn't work, try reading with utf-8 might fix the issue, as its a pretty common encoding anyways.

Any idea how can I get it fixed up-stream (assuming its windows/pathlib) or narrow down the scope?

aarmn avatar Jan 30 '25 15:01 aarmn