undetected-chromedriver
undetected-chromedriver copied to clipboard
User data directory is already in use while running in docker
Hi, I am trying to run this package in a headless mode (using XVFB, because meet.google website detects if I specify that option in Selenium). This package needs to run as a docker container, but I cannot make it work.
My OS is Ubuntu 20.04. My current options are as follows:
def uc_selenium_test():
display = Display()
display.start()
options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--disable-setuid-sandbox")
options.add_argument("--disable-extensions")
options.add_argument("--disable-infobars")
options.add_argument("--disable-plugins-discovery")
options.add_argument("--disable-dev-shm-usage")
options.user_data_dir = "/bot/google-chrome"
print("Initializing webdriver :)))")
browser = uc.Chrome(
options=options, service_log_path='log')
print("Halleluyah")
print("Accessing page")
browser.get("https://google.com")
print(browser.page_source)
display.stop()
I've managed to get through this issue: https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/111 with small change in the package:
- if not options.debugger_address:
- options.debugger_address = debug_addr
+ #if not options.debugger_address:
+ # options.debugger_address = debug_addr
But after this fix I am getting a following error:
removing profile : /tmp/tmpzbmyd_k_
Traceback (most recent call last):
File "google_meet_bot/tests/selenium_test.py", line 37, in <module>
uc_selenium_test()
File "google_meet_bot/tests/selenium_test.py", line 27, in uc_selenium_test
browser = uc.Chrome(
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 304, in __init__
super(Chrome, self).__init__(
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
RemoteWebDriver.__init__(
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 579, in start_session
super(Chrome, self).start_session(capabilities, browser_profile)
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: user data directory is already in use, please specify a unique value for --user-data-dir argument, or don't use --user-data-dir
Also additional logs from the selenium (last two entries for brevity):
[1628001570.898][INFO]: Launching chrome: /usr/bin/google-chrome --allow-pre-commit-input --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-infobars --disable-plugins-discovery --disable-popup-blocking --disable-prompt-on-repost --disable-setuid-sandbox --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --lang=en-US --log-level=0 --no-first-run --no-sandbox --no-service-autorun --password-store=basic --remote-debugging-host=127.0.0.1 --remote-debugging-port=49259 --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/tmp7b1nc7xd
[1628001571.926][INFO]: [c425997c3700079b1ce8c907274df291] RESPONSE InitSession ERROR invalid argument: user data directory is already in use, please specify a unique value for --user-data-dir argument, or don't use --user-data-dir
Interestingly enough, if add options.add_argument("--headless")
to the setup options, the error disappears.
Do you know what it might be causing that @ultrafunkamsterdam ? Any hints/tips what I should look at? Any help will be greatly appreciated!
I believe the issue is related to https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/217
this
- if not options.debugger_address:
- options.debugger_address = debug_addr
+ #if not options.debugger_address:
+ # options.debugger_address = debug_addr
makes chrome subprocess started by UC useless, the porpuse of v2 is attach chromedriver to a chrome instance not started by chromedriver, for that options.debugger_address
must be set to an already running chrome instance. This is called remote chrome approach, it avoids several fingerprints that may be added if chrome be started by chromedriver.
You have 2 instances of chrome running, both set to use same user_data_dir
, so probably that's the cause of user profile being already used.
Interesting, so how do I make those 2 instances of Chrome use different user_data_dir, since there is only one argument for the user directory that is shared among those 2 processes?
@KacperKubara I think you missed the point. UC v2 starts a chrome instance throught subprocess.Popen()
for chromedriver to connect to it throught options.debugger_address
. If you don't set debugger address, chromedriver itself will start another chrome instance and ignore the one started by UC v2. If you can't use subprocess started by v2, don't use v2, use v1:
import undetected_chromedriver as uc
driver = uc.Chrome()
Thanks @HMaker , I will try it out after come back from holidays! If you happen to have a working dockerfile with some sample python script that you can share, it would be really helpful. I've been pulling out my hair on that issue since the past week 🤔
I run it headless mode, but it seems you want to use virtual displays. As I suspected the problem was not in docker, subprocess is not an issue since docker allows multiple processes per container, the issue was wrong settings of UC v2. I use following Dockerfile for headless:
FROM python:3.8.11-slim-buster
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# install chrome
RUN apt-get update \
&& apt-get install -y --no-install-recommends wget \
&& wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb \
&& rm google-chrome-stable_current_amd64.deb \
&& chown root.root -R /opt/google/chrome/ \
&& chmod 755 -R /opt/google/chrome/ \
&& chmod 4755 -R /opt/google/chrome/chrome-sandbox
# install python dependencies ...
RUN python -m venv /appenv
ENV VIRTUAL_ENV /appenv
ENV PATH /appenv/bin:$PATH
# maybe RUN pip install -r requirements.txt
# install chromedriver
ENV CHROMEDRIVER_PATH /appenv/chromedriver
RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='$CHROMEDRIVER_PATH', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()"
RUN chmod +x /appenv/chromedriver
Then on python you can use
import os
import undetected_chromedriver.v2 as uc
options = uc.ChromeOptions()
options.add_argument('--disable-gpu') # for headless
options.add_argument('--disable-dev-shm-usage') # uses /tmp for memory sharing
# disable popups on startup
options.add_argument('--no-first-run')
options.add_argument('--no-service-autorun')
options.add_argument('--no-default-browser-check')
options.add_argument('--password-store=basic')
# if you set options.user_data_dir make sure it won't be used by multiple chrome instances...
# if it's not set UC will create a random temp folder for it.
chrome = uc.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), headless=True, options=options)
If you experience chrome not reachable
with non headless mode, see #268 on instructions of how to debug it.
Thanks, i will try it out. I think that i actually managed to run chromedriver in headless mode. However, I need to run it in non-headless mode with some virtual screen since meet.google.com can detect it :( After i made headless mode work, non-headless optiom gives me this error. Has anyone encounteree that?
dockerfile_test1.py dockerfile_test2.py google-chrome-stable_current_amd64.deb
root@5b486aad4d13:/docker_tmp# python dockerfile_test2.py
[876:925:0807/131757.864432:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[876:925:0807/131757.864475:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[922:922:0807/131757.865525:ERROR:gpu_init.cc(441)] Passthrough is not supported, GL is swiftshader
[876:925:0807/131757.875476:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[876:925:0807/131757.875516:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[876:989:0807/131757.909412:ERROR:object_proxy.cc(622)] Failed to call method: org.freedesktop.DBus.Properties.Get: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UPower was not provided by any .service files
[876:989:0807/131757.909553:ERROR:object_proxy.cc(622)] Failed to call method: org.freedesktop.UPower.GetDisplayDevice: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UPower was not provided by any .service files
[876:989:0807/131757.909690:ERROR:object_proxy.cc(622)] Failed to call method: org.freedesktop.UPower.EnumerateDevices: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UPower was not provided by any .service files
[0807/131802.723241:ERROR:nacl_helper_linux.cc(307)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox
@KacperKubara pass options.add_argument("--no-sandbox")
because UC v2 sets that option only for headless mode (wonder why). I tried but could not find a way to run chrome with sandbox enabled, it's bad to disable it, it's a workaround. I tried this to set up sandbox https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/linux/suid_sandbox_development.md
I run it headless mode, but it seems you want to use virtual displays. As I suspected the problem was not in docker, subprocess is not an issue since docker allows multiple processes per container, the issue was wrong settings of UC v2. I use following Dockerfile for headless:
FROM python:3.8.11-slim-buster ENV PYTHONDONTWRITEBYTECODE 1 ENV PYTHONUNBUFFERED 1 # install chrome RUN apt-get update \ && apt-get install -y --no-install-recommends wget \ && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \ && apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb \ && rm google-chrome-stable_current_amd64.deb \ && chown root.root -R /opt/google/chrome/ \ && chmod 755 -R /opt/google/chrome/ \ && chmod 4755 -R /opt/google/chrome/chrome-sandbox # install python dependencies ... RUN python -m venv /appenv ENV VIRTUAL_ENV /appenv ENV PATH /appenv/bin:$PATH # maybe RUN pip install -r requirements.txt # install chromedriver ENV CHROMEDRIVER_PATH /appenv/chromedriver RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='$CHROMEDRIVER_PATH', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()" RUN chmod +x /appenv/chromedriver
Then on python you can use
import os import undetected_chromedriver.v2 as uc options = uc.ChromeOptions() options.add_argument('--disable-gpu') # for headless options.add_argument('--disable-dev-shm-usage') # uses /tmp for memory sharing # disable popups on startup options.add_argument('--no-first-run') options.add_argument('--no-service-autorun') options.add_argument('--no-default-browser-check') options.add_argument('--password-store=basic') # if you set options.user_data_dir make sure it won't be used by multiple chrome instances... # if it's not set UC will create a random temp folder for it. chrome = uc.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), headless=True, options=options)
If you experience
chrome not reachable
with non headless mode, see #268 on instructions of how to debug it.
Hi,
Do you have a version for arm64 ? i search on internet but impossible to find a google chrome stable version for arm64 in .deb
@jeromegallego68 Google does not builds Chrome for ARM64, except for Android phones. All you can do is build chromium for ARM64, you can find it in Ubuntu repositories as chromium-browser
package. Chrome is closed source, chromium is its open-sourced core.
@HMaker Thank's, your answer confirm what i found in my research.
Finally i decided to emulate my docker container with amd64 (i have a arm64 processor on my local machine) and it's seems good but it cannot find the chromedriver.
Here is the error :
------
> [7/9] RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='/appenv/chromedriver', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()":
#11 2.526 Traceback (most recent call last):
#11 2.526 File "<string>", line 1, in <module>
#11 2.527 File "/appenv/lib/python3.8/site-packages/undetected_chromedriver/patcher.py", line 91, in auto
#11 2.529 ispatched = self.is_binary_patched(self.executable_path)
#11 2.529 File "/appenv/lib/python3.8/site-packages/undetected_chromedriver/patcher.py", line 209, in is_binary_patched
#11 2.529 with io.open(executable_path, "rb") as fh:
#11 2.529 FileNotFoundError: [Errno 2] No such file or directory: '/appenv/chromedriver'
------
Do you have an idea how to fix it ? i guess i'm really close to succeeded
@HMaker here is my Dockerfile, maybe something is wrong here
# As Scrapy runs on Python, I choose the official Python 3 Docker image.
FROM --platform=linux/amd64 python:3.8.11-slim-buster
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# install chrome
RUN apt-get update \
&& apt-get install -y --no-install-recommends wget \
&& wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb \
&& rm google-chrome-stable_current_amd64.deb \
&& chown root.root -R /opt/google/chrome/ \
&& chmod 755 -R /opt/google/chrome/ \
&& chmod 4755 -R /opt/google/chrome/chrome-sandbox
# install python dependencies ...
RUN python -m venv /appenv
RUN /appenv/bin/python -m pip install --upgrade pip
ENV VIRTUAL_ENV /appenv
ENV PATH /appenv/bin:$PATH
# maybe RUN pip install -r requirements.txt
# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./
# Install Scrapy specified in requirements.txt.
RUN pip install -r requirements.txt
# install chromedriver
ENV CHROMEDRIVER_PATH /appenv/chromedriver
RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='$CHROMEDRIVER_PATH', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()"
RUN chmod +x /appenv/chromedriver
# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .
# Run the crawler when the container launches.
CMD [ "python3", "./go-spider.py" ]
Wondering If anyone tried the remote. chrome function and was successful.