util: quote sh list safely
subprocess accepts bytes, PathLike obj and str but shlex.quote only accept str type. So need a normalize api.
Can you please add a small test to showcase what exactly this PR fixes? If I understand correctly it is a cosmetic change that makes sure that the printed command on stdout is valid?
Can you please add a small test to showcase what exactly this PR fixes? If I understand correctly it is a cosmetic change that makes sure that the printed command on stdout is valid?
If developer use subprocess api in west extension and the cmd fails with a CalledProcessError, the WestApp class will catch and then print the cmd
https://github.com/zephyrproject-rtos/west/blob/76890ef68124369c9b3ccd1ef968e47ab5f3183f/src/west/app/main.py#L671
subprocess cmd list accepts bytes, str and path like objects but quote_sh_list only accepts str so it may raise another exception in this error handling block.
A real case is https://github.com/zephyrproject-rtos/zephyr/pull/100565
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 85.95%. Comparing base (76890ef) to head (7118ad9).
:warning: Report is 8 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #896 +/- ##
==========================================
+ Coverage 84.93% 85.95% +1.02%
==========================================
Files 11 11
Lines 3444 3453 +9
==========================================
+ Hits 2925 2968 +43
+ Misses 519 485 -34
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/west/util.py | 94.44% <100.00%> (+5.87%) |
:arrow_up: |
The PR was originally to solve following issue:
from pathlib import Path
from west.commands import WestCommand
class SubprocessFail(WestCommand):
def __init__(self):
super().__init__(
'subprocess_fail',
'Test subprocess fail',
'Test subprocess fail')
def do_add_parser(self, parser_adder):
return parser_adder.add_parser(self.name,
help=self.help,
description = self.description)
def do_run(self, args, ignored):
self.check_call(["ls", Path("/nonexistent")])
Then it will fail in exit:
Traceback (most recent call last):
File "/home/yves/Repos/zephyrproject/.venv/bin/west", line 7, in <module>
sys.exit(main())
^^^^^^
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 1199, in main
app.run(argv or sys.argv[1:])
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 278, in run
self.run_command(argv, early_args)
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 584, in run_command
self.run_extension(args.command, argv)
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 739, in run_extension
self.cmd.run(args, unknown, self.topdir, manifest=self.manifest,
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/commands.py", line 200, in run
self.do_run(args, unknown)
File "/home/yves/Repos/zephyrproject/zephyr/scripts/west_commands/subprocess_fail.py", line 17, in do_run
self.check_call(["ls", Path("/nonexistent")])
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/commands.py", line 315, in check_call
self._log_subproc(args, **kwargs)
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/commands.py", line 303, in _log_subproc
self.dbg(f"running '{quote_sh_list(args)}' in "
^^^^^^^^^^^^^^^^^^^
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/util.py", line 44, in quote_sh_list
return ' '.join(shlex.quote(s) for s in cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/util.py", line 44, in <genexpr>
return ' '.join(shlex.quote(s) for s in cmd)
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/shlex.py", line 327, in quote
if _find_unsafe(s) is None:
^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'PosixPath'
I agree with that we can ignore byte sequence because it is uncommon and not allowed under Windows. But I am not sure whether print a command without quote it is a correct way.
# Without quote
ls: cannot access '/nonexistent': No such file or directory
FATAL ERROR: command exited with status 2: ['ls', PosixPath('/nonexistent')]
# With quote
ls: cannot access '/nonexistent': No such file or directory
FATAL ERROR: command exited with status 2: ls /nonexistent
IMO, the second one is more friendly to user to reproduce it. User can get more accurate log with west -vvv
running 'ls /nonexistent' in /home/yves/Repos/zephyrproject/zephyr
ls: cannot access '/nonexistent': No such file or directory
FATAL ERROR: command exited with status 2: ls /nonexistent
=== Traceback (enabled by -vvv):
Traceback (most recent call last):
File "/home/yves/Repos/west/src/west/app/main.py", line 633, in run_command
self.run_extension(args.command, argv)
File "/home/yves/Repos/west/src/west/app/main.py", line 791, in run_extension
self.cmd.run(args, unknown, self.topdir, manifest=self.manifest, config=self.config)
File "/home/yves/Repos/west/src/west/commands.py", line 214, in run
self.do_run(args, unknown)
File "/home/yves/Repos/zephyrproject/zephyr/scripts/west_commands/subprocess_fail.py", line 17, in do_run
self.check_call(["ls", Path("/nonexistent")])
File "/home/yves/Repos/west/src/west/commands.py", line 333, in check_call
subprocess.check_call(args, **kwargs)
File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ls', PosixPath('/nonexistent')]' returned non-zero exit status 2.
The code change looks good to me, thanks! The commit message in https://github.com/zephyrproject-rtos/west/commit/2b3445f91cca37b5e8732b5c44254f71425dc6d9 still does not make sense to me:
subprocess accepts bytes, PathLike obj and str but shlex.quote only accept str type. So need convert PathLike object to str. Ignore byte sequence because it is uncommon and not allowed under Windows.
Where did you see that subprocess accepts bytes?
Do you realize the error message below is NOT from subprocess?
>>> import shlex, pathlib
>>> shlex.quote(pathlib.Path("."))
Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
shlex.quote(pathlib.Path("."))
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.13/shlex.py", line 327, in quote
if _find_unsafe(s) is None:
~~~~~~~~~~~~^^^
TypeError: expected string or bytes-like object, got 'PosixPath'
(A bit off-topic in this PR)
IMO, the second one is more friendly to user to reproduce it.
Yes it's more user friendly; except when... it's WRONG:
Warning The shlex module is only designed for Unix shells. The quote() function is not guaranteed to be correct on non-POSIX compliant shells or shells from other operating systems such as Windows. Executing commands quoted by this module on such shells can open up the possibility of a command injection vulnerability.
Incorrect is always less user-friendly than correct: much more time-consuming. Also, joining the list of arguments loses debug information.
Do you realize the error message below is NOT from subprocess?
I see, the quote_sh_list was usually used to quote args passed to subprocess, so I just mentioned it.
Incorrect is always less user-friendly than correct: much more time-consuming. Also, joining the list of arguments loses debug information.
For debug, user needs to specify verbose parameters. Otherwise the default console logs look a little duplication.
So do you plan to change the commit message?
So do you plan to change the commit message?
No, it's clear.
It's clear but most of the comments about bytes seem incorrect.
What's do you think about this:
subprocess args can accept str, PathLike, or bytes (POSIX only), but shlex.quote() only works with str.
For safety and cross-platform compatibility, convert PathLike to str and ignore bytes since they are rare and unsupported on Windows.
Where did you see that subprocess accepts bytes?
Still waiting for the answer to that question.
As far as I can see, subprocess accepts strings and class PathLike
Where did you see that subprocess accepts bytes?
Still waiting for the answer to that question.
As far as I can see, subprocess accepts strings and
class PathLike
https://docs.python.org/3/library/subprocess.html
tl;dr: please drop the reference to Windows in the commit message.
For safety and cross-platform compatibility, convert PathLike to str and ignore bytes since they are rare and unsupported on Windows.
-
"Rare" - agreed. Or, better: "discouraged", on every Operating System.
-
"Unsupported on Windows": no, it's not that simple. I did find Windows-specific issues but only when "abusing" the subprocess API a bit. For instance, these encoded bytes work fine across all operating systems, including Windows:
import os, subprocess
subprocess.check_call([os.fsencode('curl'), os.fsencode('--help')], shell=False)
Changed in version 3.8: args parameter accepts a path-like object if shell is False and a sequence containing bytes and path-like objects on Windows.
Yes I saw that. But it's only a changelog, it's the not part of actual API description. Funny how a changelog refers to a non-existing documentation? I did a fair amount of research and reading and even some testing and I think I solved that mystery.
(BTW avoid screenshots when not necessary https://docs.zephyrproject.org/latest/develop/getting_started/index.html#use-copy-paste)
That changelog was about a bug fix for a Windows-specific bug https://github.com/python/cpython/pull/5914/files
I think the most relevant and interesting document is PEP 519. Long story short, the purpose of PEP 519 is to encourage people to use PathLike objects and to discourage people from using encoded strings (bytes).
https://peps.python.org/pep-0519/#os
Another way to view this is as a hierarchy of file system path representations (highest- to lowest-level): path → str → bytes. The functions and classes under discussion can all accept objects on the same level of the hierarchy, but they vary in whether they promote or demote objects to another level.
However, Python could not get rid of bytes for backwards compatibility reasons:
https://peps.python.org/pep-0519/#have-fspath-only-return-strings
Much of the discussion that led to this PEP revolved around whether fspath() should be polymorphic and return bytes as well as str or only return str. The general sentiment for this view was that bytes are difficult to work with due to their inherent lack of information about their encoding [...] In the end, it was decided that using bytes to represent paths is simply not going to go away and thus they should be supported to some degree. The hope is that people will gravitate towards path objects like pathlib and that will move people away from operating directly with bytes.
So, I think this is why bytes are never directly mentioned in the API description; only indirectly as in the changelog, or as some PathLike implementation detail.
So the reason why we don't want to support encoded bytes in this quote function is because no extension should ever use bytes on any operating system. Not because misusing the subprocess API on Windows can sometimes fail in some obscure and barely legal use cases.
Some testing:
# Python 3.12. shell=False is the (sane) default
import os,subprocess
# Bad `shell` usage that works only Windows! Because winapi.CreateProcess() is a pretty bad API
subprocess.check_call('curl --help', shell=False)
# But once encoded, this fails on all operating systems as it should - even Windows. Go figure.
subprocess.check_call(os.fsencode('curl --help'), shell=False)
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified
# bytes that works everywhere across all operating systems including Windows:
subprocess.check_call([os.fsencode('curl'), os.fsencode('--help')], shell=False)
# Encoded combination that indeed fails only on Windows - no clear reason why.
# But there is no reason to encode when using `shell=True` anyway.
subprocess.check_call(os.fsencode('curl --help'), shell=True)
TypeError: bytes args is not allowed on Windows
tl;dr: please drop the reference to Windows in the commit message.
Make sense! Thanks for the detailed references.