Navigate jobs do not work in sandboxed Linux environments (Docker, LXC, Namespaces, ...)
When I run urlwatch in a docker container with a job whose navigate is given, I get the following error:
Exception while releasing resources for job: <browser navigate='<URL>' name='<JOBNAME>' filter=[{'xpath': '/html/body/div[4]/div[4]/form/div/div[3]/table/tbody'}, {'grepi': 'google-query'}, {'html2text': {'method': 'pyhtml2text', 'unicode_snob': True, 'body_width': 0, 'inline_links': False, 'ignore_links': True, 'ignore_images': True, 'pad_tables': False, 'single_line_break': True}}]>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urlwatch/handler.py", line 77, in __exit__
self.job.main_thread_exit()
File "/usr/local/lib/python3.6/dist-packages/urlwatch/jobs.py", line 377, in main_thread_exit
self.ctx.close()
AttributeError: 'BrowserJob' object has no attribute 'ctx'
===========================================================================
01. ERROR: <JOBNAME>
===========================================================================
---------------------------------------------------------------------------
ERROR: <JOBNAME AND URL>
---------------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urlwatch/handler.py", line 67, in __enter__
self.job.main_thread_enter()
File "/usr/local/lib/python3.6/dist-packages/urlwatch/jobs.py", line 374, in main_thread_enter
self.ctx = BrowserContext()
File "/usr/local/lib/python3.6/dist-packages/urlwatch/browser.py", line 86, in __init__
BrowserContext._BROWSER_LOOP = BrowserLoop()
File "/usr/local/lib/python3.6/dist-packages/urlwatch/browser.py", line 45, in __init__
self._browser = self._event_loop.run_until_complete(self._launch_browser())
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/dist-packages/urlwatch/browser.py", line 51, in _launch_browser
browser = yield from pyppeteer.launch()
File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 305, in launch
return await Launcher(options, **kwargs).launch()
File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 166, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 225, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
---------------------------------------------------------------------------
--
urlwatch 2.21, Copyright 2008-2020 Thomas Perl
Website: https://thp.io/2008/urlwatch/
watched 1 URLs in 30 seconds
I've googled the error and I found out that you need to pass in --no-sandbox to Chromium executable to work with root user. However, even if I change user to a non-root one, I get the same error. Also, there is no way to supply --no-sandbox argument to the call that urlwatch makes to Chromium.
Has anybody else faced the same problem?
I cloned the repo and added --no-sandbox to launcher options, still no luck.
Have you tried running the Docker container as privileged container?
Just tried, got the same error.
For the context, here is my Dockerfile:
FROM ubuntu:18.04
WORKDIR /
ENV GMAIL_EMAIL GMAIL_EMAIL
ENV GMAIL_PASSWORD GMAIL_PASSWORD
ENV PYPPETEER_HOME /home/theuser/.pyppeteer
RUN apt-get update
RUN apt-get install -y python3 python3-pip
RUN pip3 install urlwatch html2text pyppeteer keyring keyrings.alt && \
mkdir -p /home/theuser/.pyppeteer
RUN pyppeteer-install
COPY urls.yaml /home/theuser/.urlwatch/urls.yaml
COPY urlwatch.yaml /home/theuser/.urlwatch/urlwatch.yaml
COPY run.sh /home/theuser/.urlwatch/run.sh
# Added this section after getting the error. Tried `--privileged` with and without this section, got the same error.
RUN groupadd theuser && useradd -g theuser -s /bin/bash -G audio,video theuser \
&& mkdir -p /home/theuser && chown -R theuser:theuser /home/theuser
USER theuser
ENTRYPOINT ["/home/theuser/.urlwatch/run.sh"]
The content of run.sh:
#!/usr/bin/env bash
set -eE
sed -i -e "s/[email protected]/${GMAIL_EMAIL}/" /home/theuser/.urlwatch/urlwatch.yaml
sed -i -e "s/mylittlepassword/${GMAIL_PASSWORD}/" /home/theuser/.urlwatch/urlwatch.yaml
echo "--- STARTING --- $(date -d @1234567890)"
urlwatch --urls /home/theuser/.urlwatch/urls.yaml --config /home/theuser/.urlwatch/urlwatch.yaml
I'm using the same urlwatch.yaml and urls.yaml in my local machine(MacOS) and it works fine.
This breaks for me in exactly the same way on a fresh Ubuntu 20.04 instance, no Docker involved.
What do you mean with "Ubuntu 20.04 instance"? If "instance" here means some server that might use namespaces (LXC/Docker), then it's basically also "Docker" (or technologies that Docker uses) involved. It should be no problem running Ubuntu 20.04 on bare metal.
To be more precise, the feature Chrome uses for sandboxing doesn't properly work inside certain Linux namespaces (that feature that makes Docker and LXC and others tick), which is why people are seeing issues there.
Hi, apologies, I was an idiot and was missing xorg dependencies - I installed xorg and it seems to be working now. It doesn't help that pyppeteer currently swallows chrome errors, the instructions in this post helped me in debugging the problem: https://github.com/pyppeteer/pyppeteer/issues/108#issuecomment-642034439