west icon indicating copy to clipboard operation
west copied to clipboard

util: quote sh list safely

Open yvesll opened this issue 1 month ago • 12 comments

subprocess accepts bytes, PathLike obj and str but shlex.quote only accept str type. So need a normalize api.

yvesll avatar Dec 08 '25 09:12 yvesll

Can you please add a small test to showcase what exactly this PR fixes? If I understand correctly it is a cosmetic change that makes sure that the printed command on stdout is valid?

thorsten-klein avatar Dec 08 '25 09:12 thorsten-klein

Can you please add a small test to showcase what exactly this PR fixes? If I understand correctly it is a cosmetic change that makes sure that the printed command on stdout is valid?

If developer use subprocess api in west extension and the cmd fails with a CalledProcessError, the WestApp class will catch and then print the cmd https://github.com/zephyrproject-rtos/west/blob/76890ef68124369c9b3ccd1ef968e47ab5f3183f/src/west/app/main.py#L671

subprocess cmd list accepts bytes, str and path like objects but quote_sh_list only accepts str so it may raise another exception in this error handling block. A real case is https://github.com/zephyrproject-rtos/zephyr/pull/100565

yvesll avatar Dec 08 '25 09:12 yvesll

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 85.95%. Comparing base (76890ef) to head (7118ad9). :warning: Report is 8 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #896      +/-   ##
==========================================
+ Coverage   84.93%   85.95%   +1.02%     
==========================================
  Files          11       11              
  Lines        3444     3453       +9     
==========================================
+ Hits         2925     2968      +43     
+ Misses        519      485      -34     
Files with missing lines Coverage Δ
src/west/util.py 94.44% <100.00%> (+5.87%) :arrow_up:

... and 3 files with indirect coverage changes

codecov[bot] avatar Dec 08 '25 10:12 codecov[bot]

The PR was originally to solve following issue:

from pathlib import Path
from west.commands import WestCommand

class SubprocessFail(WestCommand):
    def __init__(self):
        super().__init__(
                'subprocess_fail',
                'Test subprocess fail',
                'Test subprocess fail')

    def do_add_parser(self, parser_adder):
        return parser_adder.add_parser(self.name,
                help=self.help,
                description = self.description)

    def do_run(self, args, ignored):
        self.check_call(["ls", Path("/nonexistent")])

Then it will fail in exit:

Traceback (most recent call last):
  File "/home/yves/Repos/zephyrproject/.venv/bin/west", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 1199, in main
    app.run(argv or sys.argv[1:])
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 278, in run
    self.run_command(argv, early_args)
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 584, in run_command
    self.run_extension(args.command, argv)
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/app/main.py", line 739, in run_extension
    self.cmd.run(args, unknown, self.topdir, manifest=self.manifest,
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/commands.py", line 200, in run
    self.do_run(args, unknown)
  File "/home/yves/Repos/zephyrproject/zephyr/scripts/west_commands/subprocess_fail.py", line 17, in do_run
    self.check_call(["ls", Path("/nonexistent")])
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/commands.py", line 315, in check_call
    self._log_subproc(args, **kwargs)
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/commands.py", line 303, in _log_subproc
    self.dbg(f"running '{quote_sh_list(args)}' in "
                         ^^^^^^^^^^^^^^^^^^^
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/util.py", line 44, in quote_sh_list
    return ' '.join(shlex.quote(s) for s in cmd)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yves/Repos/zephyrproject/.venv/lib/python3.12/site-packages/west/util.py", line 44, in <genexpr>
    return ' '.join(shlex.quote(s) for s in cmd)
                    ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/shlex.py", line 327, in quote
    if _find_unsafe(s) is None:
       ^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'PosixPath'

I agree with that we can ignore byte sequence because it is uncommon and not allowed under Windows. But I am not sure whether print a command without quote it is a correct way.

# Without quote
ls: cannot access '/nonexistent': No such file or directory
FATAL ERROR: command exited with status 2: ['ls', PosixPath('/nonexistent')]
# With quote
ls: cannot access '/nonexistent': No such file or directory
FATAL ERROR: command exited with status 2: ls /nonexistent

IMO, the second one is more friendly to user to reproduce it. User can get more accurate log with west -vvv

running 'ls /nonexistent' in /home/yves/Repos/zephyrproject/zephyr
ls: cannot access '/nonexistent': No such file or directory
FATAL ERROR: command exited with status 2: ls /nonexistent
=== Traceback (enabled by -vvv):
Traceback (most recent call last):
  File "/home/yves/Repos/west/src/west/app/main.py", line 633, in run_command
    self.run_extension(args.command, argv)
  File "/home/yves/Repos/west/src/west/app/main.py", line 791, in run_extension
    self.cmd.run(args, unknown, self.topdir, manifest=self.manifest, config=self.config)
  File "/home/yves/Repos/west/src/west/commands.py", line 214, in run
    self.do_run(args, unknown)
  File "/home/yves/Repos/zephyrproject/zephyr/scripts/west_commands/subprocess_fail.py", line 17, in do_run
    self.check_call(["ls", Path("/nonexistent")])
  File "/home/yves/Repos/west/src/west/commands.py", line 333, in check_call
    subprocess.check_call(args, **kwargs)
  File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ls', PosixPath('/nonexistent')]' returned non-zero exit status 2.

yvesll avatar Dec 09 '25 05:12 yvesll

The code change looks good to me, thanks! The commit message in https://github.com/zephyrproject-rtos/west/commit/2b3445f91cca37b5e8732b5c44254f71425dc6d9 still does not make sense to me:

subprocess accepts bytes, PathLike obj and str but shlex.quote only accept str type. So need convert PathLike object to str. Ignore byte sequence because it is uncommon and not allowed under Windows.

Where did you see that subprocess accepts bytes?

Do you realize the error message below is NOT from subprocess?

>>> import shlex, pathlib
>>> shlex.quote(pathlib.Path("."))
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    shlex.quote(pathlib.Path("."))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/shlex.py", line 327, in quote
    if _find_unsafe(s) is None:
       ~~~~~~~~~~~~^^^
TypeError: expected string or bytes-like object, got 'PosixPath'


(A bit off-topic in this PR)

IMO, the second one is more friendly to user to reproduce it.

Yes it's more user friendly; except when... it's WRONG:

Warning The shlex module is only designed for Unix shells. The quote() function is not guaranteed to be correct on non-POSIX compliant shells or shells from other operating systems such as Windows. Executing commands quoted by this module on such shells can open up the possibility of a command injection vulnerability.

Incorrect is always less user-friendly than correct: much more time-consuming. Also, joining the list of arguments loses debug information.

marc-hb avatar Dec 10 '25 02:12 marc-hb

Do you realize the error message below is NOT from subprocess?

I see, the quote_sh_list was usually used to quote args passed to subprocess, so I just mentioned it.

Incorrect is always less user-friendly than correct: much more time-consuming. Also, joining the list of arguments loses debug information.

For debug, user needs to specify verbose parameters. Otherwise the default console logs look a little duplication.

yvesll avatar Dec 10 '25 02:12 yvesll

So do you plan to change the commit message?

marc-hb avatar Dec 10 '25 02:12 marc-hb

So do you plan to change the commit message?

No, it's clear.

yvesll avatar Dec 10 '25 03:12 yvesll

It's clear but most of the comments about bytes seem incorrect.

marc-hb avatar Dec 10 '25 03:12 marc-hb

What's do you think about this:

subprocess args can accept str, PathLike, or bytes (POSIX only), but shlex.quote() only works with str. 
For safety and cross-platform compatibility, convert PathLike to str and ignore bytes since they are rare and unsupported on Windows.

yvesll avatar Dec 10 '25 03:12 yvesll

Where did you see that subprocess accepts bytes?

Still waiting for the answer to that question.

As far as I can see, subprocess accepts strings and class PathLike

marc-hb avatar Dec 11 '25 02:12 marc-hb

Where did you see that subprocess accepts bytes?

Still waiting for the answer to that question.

As far as I can see, subprocess accepts strings and class PathLike

https://docs.python.org/3/library/subprocess.html image

yvesll avatar Dec 11 '25 03:12 yvesll

tl;dr: please drop the reference to Windows in the commit message.

For safety and cross-platform compatibility, convert PathLike to str and ignore bytes since they are rare and unsupported on Windows.

  • "Rare" - agreed. Or, better: "discouraged", on every Operating System.

  • "Unsupported on Windows": no, it's not that simple. I did find Windows-specific issues but only when "abusing" the subprocess API a bit. For instance, these encoded bytes work fine across all operating systems, including Windows:

import os, subprocess
subprocess.check_call([os.fsencode('curl'),  os.fsencode('--help')], shell=False)

Changed in version 3.8: args parameter accepts a path-like object if shell is False and a sequence containing bytes and path-like objects on Windows.

Yes I saw that. But it's only a changelog, it's the not part of actual API description. Funny how a changelog refers to a non-existing documentation? I did a fair amount of research and reading and even some testing and I think I solved that mystery.

(BTW avoid screenshots when not necessary https://docs.zephyrproject.org/latest/develop/getting_started/index.html#use-copy-paste)

That changelog was about a bug fix for a Windows-specific bug https://github.com/python/cpython/pull/5914/files

I think the most relevant and interesting document is PEP 519. Long story short, the purpose of PEP 519 is to encourage people to use PathLike objects and to discourage people from using encoded strings (bytes).

https://peps.python.org/pep-0519/#os

Another way to view this is as a hierarchy of file system path representations (highest- to lowest-level): path → str → bytes. The functions and classes under discussion can all accept objects on the same level of the hierarchy, but they vary in whether they promote or demote objects to another level.

However, Python could not get rid of bytes for backwards compatibility reasons:

https://peps.python.org/pep-0519/#have-fspath-only-return-strings

Much of the discussion that led to this PEP revolved around whether fspath() should be polymorphic and return bytes as well as str or only return str. The general sentiment for this view was that bytes are difficult to work with due to their inherent lack of information about their encoding [...] In the end, it was decided that using bytes to represent paths is simply not going to go away and thus they should be supported to some degree. The hope is that people will gravitate towards path objects like pathlib and that will move people away from operating directly with bytes.

So, I think this is why bytes are never directly mentioned in the API description; only indirectly as in the changelog, or as some PathLike implementation detail.

So the reason why we don't want to support encoded bytes in this quote function is because no extension should ever use bytes on any operating system. Not because misusing the subprocess API on Windows can sometimes fail in some obscure and barely legal use cases.

Some testing:

 # Python 3.12. shell=False is the (sane) default
 import os,subprocess

 # Bad `shell` usage that works only Windows! Because winapi.CreateProcess() is a pretty bad API
 subprocess.check_call('curl --help', shell=False)

 # But once encoded, this fails on all operating systems as it should - even Windows. Go figure.
 subprocess.check_call(os.fsencode('curl --help'), shell=False)
       hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       FileNotFoundError: [WinError 2] The system cannot find the file specified

 # bytes that works everywhere across all operating systems including Windows:
 subprocess.check_call([os.fsencode('curl'),  os.fsencode('--help')], shell=False)

 # Encoded combination that indeed fails only on Windows - no clear reason why.
 # But there is no reason to encode when using `shell=True` anyway.
 subprocess.check_call(os.fsencode('curl --help'), shell=True)
 TypeError: bytes args is not allowed on Windows

marc-hb avatar Dec 15 '25 22:12 marc-hb

tl;dr: please drop the reference to Windows in the commit message.

Make sense! Thanks for the detailed references.

yvesll avatar Dec 16 '25 05:12 yvesll