readthedocs.org icon indicating copy to clipboard operation
readthedocs.org copied to clipboard

Fail to run `build.jobs` with single quotes (`'`) in commands

Open XuehaiPan opened this issue 2 years ago • 15 comments

Details

  • Read the Docs project URL: https://readthedocs.org/projects/nvitop/
  • Build URL (if applicable): https://readthedocs.org/projects/nvitop/builds/17338573/
  • Read the Docs username (if applicable): https://readthedocs.org/profiles/XuehaiPan/

Expected Result

A description of what you wanted to happen

Run build.jobs command successfully

My .readthedocs.yaml:

# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
  os: ubuntu-22.04
  tools:
    python: "3.8"
  jobs:
    post_install:
      - sed -i -E 's/^     process identity for every yielded instance$/  \0/' "$(python3 -c "print(__import__('psutil').__file__)")"

# Build documentation in the docs/ directory with Sphinx
sphinx:
  builder: html
  configuration: docs/source/conf.py
  fail_on_warning: true

# Optionally declare the Python requirements required to build your docs
python:
  install:
    - requirements: requirements.txt
    - requirements: docs/requirements.txt
  system_packages: true

I want to run command:

sed -i -E 's/^     process identity for every yielded instance$/  \0/' "$(python3 -c "print(__import__('psutil').__file__)")"

after installing the dependencies.

Actual Result

A description of what actually happened

Got errors in raw output:

[rtd-command-info] start-time: 2022-07-02T10:39:42.282806Z, end-time: 2022-07-02T10:39:42.362971Z, duration: 0, exit-code: 1
sed -i -E 's/^ process identity for every yielded instance$/ \0/' "$(python3 -c "print(__import__('psutil').__file__)")"
sed: -e expression #1, char 3: unterminated `s' command

In:

https://github.com/readthedocs/readthedocs.org/blob/62effc771a6171f9252dd8bbaf779cf8c7d0a975/readthedocs/doc_builder/director.py#L339-L341

we split the command with whitespace into a list of command-line options. In my use case, I have:

>>> command = r'''sed -i -E 's/^     process identity for every yielded instance$/  \0/' "$(python3 -c "print(__import__('psutil').__file__)")"'''
>>> print(command)
sed -i -E 's/^     process identity for every yielded instance$/  \0/' "$(python3 -c "print(__import__('psutil').__file__)")"

>>> command.split()  # split with whitespace
[
    'sed',
    '-i',
    '-E',
    "'s/^",
    'process',
    'identity',
    'for',
    'every',
    'yielded',
    'instance$/',
    "\\0/'",
    '"$(python3',
    '-c',
    '"print(__import__(\'psutil\').__file__)")"'
]

Expected split:

[
    'sed',
    '-i',
    '-E',
    "'s/^     process identity for every yielded instance$/  \\0/'",
    '"$(python3 -c "print(__import__(\'psutil\').__file__)")"'
]

XuehaiPan avatar Jul 02 '22 10:07 XuehaiPan

Hi @XuehaiPan! By using double quotes (") instead of single quotes (') you can work around this issue.

sed -i -E "s/^     process identity for every yielded instance$/  \0/" "$(python3 -c "print(__import__('psutil').__file__)")"

I did a quick test in https://readthedocs.org/projects/test-builds/builds/17348985/ and ~~it worked~~. It passed the build, but it removed the initial spaces from the sed replacement string 😞

humitos avatar Jul 04 '22 11:07 humitos

We could use shlex.split instead of regular split. It splits the command as we want, but it removes the ' so the command fails anyways:

In [1]: import shlex

In [2]: command = r'''sed -i -E 's/^     process identity for every yielded instance$/  \0/' "$(python3 -c "print(__import__('p
   ...: sutil').__file__)")"'''

In [3]: shlex.split(command)
Out[3]: 
['sed',
 '-i',
 '-E',
 's/^     process identity for every yielded instance$/  \\0/',
 '$(python3 -c print(__import__(psutil).__file__))']

Even with double quotes ("):

In [4]: command = r'''sed -i -E "s/^     process identity for every yielded instance$/  \0/" "$(python3 -c "print(__import__('p
   ...: sutil').__file__)")"'''

In [5]: shlex.split(command)
Out[5]: 
['sed',
 '-i',
 '-E',
 's/^     process identity for every yielded instance$/  \\0/',
 '$(python3 -c print(__import__(psutil).__file__))']

That said, I'm not convinced that shlex.split is better than regular split since escaping the quotes also breaks. I'm curious how other CIs like GitHub Actions or CircleCI handle this. Do you know?

humitos avatar Jul 04 '22 11:07 humitos

As a workaround, I created a single Bash script fix-psutil-docstring.sh:

#!/usr/bin/env bash

# shellcheck disable=SC2312
exec sed -i -E 's/^     process identity for every yielded instance$/  \0/' "$(python3 -c "print(__import__('psutil').__file__)")"

Then put:

build:
  os: ubuntu-22.04
  tools:
    python: "3.8"
  jobs:
    post_install:
      - bash docs/source/fix-psutil-docstring.sh

in .readthedocs.yaml.

XuehaiPan avatar Jul 04 '22 13:07 XuehaiPan

@humitos

The BuildEnvironment accepts both str and List[str] as command:

https://github.com/readthedocs/readthedocs.org/blob/6bf0bede7b757f1e9458e29ba89b591389cae4d5/readthedocs/doc_builder/environments.py#L39-L62

It seems unnecessary to word splitting.


We split the command in

https://github.com/readthedocs/readthedocs.org/blob/6bf0bede7b757f1e9458e29ba89b591389cae4d5/readthedocs/doc_builder/director.py#L334-L336

Then join them again before executing:

https://github.com/readthedocs/readthedocs.org/blob/6bf0bede7b757f1e9458e29ba89b591389cae4d5/readthedocs/doc_builder/environments.py#L218-L222

XuehaiPan avatar Jul 04 '22 13:07 XuehaiPan

EDIT to https://github.com/readthedocs/readthedocs.org/issues/9397#issuecomment-1173809071:

Only run BuildEnvironment with argument shell=True will join the command parts into a single str:

https://github.com/readthedocs/readthedocs.org/blob/6bf0bede7b757f1e9458e29ba89b591389cae4d5/readthedocs/doc_builder/environments.py#L148-L151

We can change:

     commands = getattr(self.data.config.build.jobs, job, [])
     for command in commands:
-         environment.run(*command.split(), escape_command=False, cwd=cwd)
+         environment.run(['bash', '-c', command], escape_command=False, cwd=cwd) 

Note that environment.run(..., shell=True) won't work because it will remove continuous whitespaces.

XuehaiPan avatar Jul 04 '22 13:07 XuehaiPan

I'm not 100% sure about the potential issues of using shell=True here. @agjohnson @ericholscher are you aware of them?

@ericholscher I saw you put this in my sprint, but I think I marked it as low priority since it involves some research about potential security issues by breaking out the Docker container and also because there is a workaround already (put the command inside a .sh file).

humitos avatar Feb 07 '23 13:02 humitos

I thought it was a somewhat simple bug fix when I saw it, related to the builders. I definitely don't want to run things with shell=True, if that's what is required.

ericholscher avatar Feb 07 '23 16:02 ericholscher

This is related to #10103 and it may be fixed by the PR associated as well.

humitos avatar Mar 07 '23 16:03 humitos

This error happens because we wrap the command as

f"/bin/bash -c '{command}'"

https://github.com/readthedocs/readthedocs.org/blob/bc4bceda2a55497ee07a5a470f1a0d54200c6873/readthedocs/doc_builder/environments.py#L386-L401

So, using ' inside user's commands is tricky. Knowing this, you should be able to re-write your command to avoid using ' or to escape them.

humitos avatar Mar 16 '23 14:03 humitos

So, using ' inside user's commands is tricky. Knowing this, you should be able to re-write your command to avoid using ' or to escape them.

This is not a great solution for the user. We should find a better way to do this. Seems like there should be a proper way to handle this? Worst case, we can write the command to a file and run it /bin/bash script.sh?

ericholscher avatar Mar 16 '23 14:03 ericholscher

@ericholscher yeah, I know it's not the best solution 😄 . I commented here since I found the exact reason of the problem now. From there, we can figure it out how to solve it and how to provide better workarounds.

humitos avatar Mar 17 '23 13:03 humitos

This error happens because we wrap the command as

f"/bin/bash -c '{command}'"

Should be documented with a .. warning

Here's the gem I had to write to circumvent it:

    - find $READTHEDOCS_OUTPUT/html -type f -iname "*.html" -exec sh -c "echo ln -s \$(basename \"\$1\") \$(dirname \"\$1\")/\$(basename \"\$1\" .html)" sh {} \;

LecrisUT avatar Mar 14 '24 15:03 LecrisUT

It'd be great if this could be fixed. Depending on the command you run, having to rely only on double quotes makes things much uglier.

ThiefMaster avatar Mar 27 '24 10:03 ThiefMaster

@ThiefMaster you can put your command/script into a file and run it from Read the Docs as mentioned in https://github.com/readthedocs/readthedocs.org/issues/9397#issuecomment-1173795211 as a workaround.

humitos avatar Mar 27 '24 11:03 humitos

would be overkill for this, but good to know it's possible

https://github.com/indico/indico/commit/420ddfb8d42f142f67ca94957ee3a87e0df70e21

    pre_install:
      # RTD python versions are usually behind a bit...
      - sed -i -E "/^python_requires = /d" setup.cfg

ThiefMaster avatar Mar 27 '24 11:03 ThiefMaster