undetected-chromedriver icon indicating copy to clipboard operation
undetected-chromedriver copied to clipboard

User data directory is already in use while running in docker

Open KacperKubara opened this issue 2 years ago • 13 comments

Hi, I am trying to run this package in a headless mode (using XVFB, because meet.google website detects if I specify that option in Selenium). This package needs to run as a docker container, but I cannot make it work.

My OS is Ubuntu 20.04. My current options are as follows:

def uc_selenium_test():
    display = Display()
    display.start()
    options = uc.ChromeOptions()
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-setuid-sandbox")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-infobars")
    options.add_argument("--disable-plugins-discovery")
    options.add_argument("--disable-dev-shm-usage")
    options.user_data_dir = "/bot/google-chrome"
    
    print("Initializing webdriver :)))")
    browser = uc.Chrome(
                        options=options, service_log_path='log')
    print("Halleluyah")
    print("Accessing page")
    browser.get("https://google.com")
    print(browser.page_source)
    display.stop()

I've managed to get through this issue: https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/111 with small change in the package:

-        if not options.debugger_address:
-            options.debugger_address = debug_addr
+        #if not options.debugger_address:
+        #    options.debugger_address = debug_addr

But after this fix I am getting a following error:

removing profile : /tmp/tmpzbmyd_k_
Traceback (most recent call last):
  File "google_meet_bot/tests/selenium_test.py", line 37, in <module>
    uc_selenium_test()
  File "google_meet_bot/tests/selenium_test.py", line 27, in uc_selenium_test
    browser = uc.Chrome(
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 304, in __init__
    super(Chrome, self).__init__(
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
    RemoteWebDriver.__init__(
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/undetected_chromedriver/v2.py", line 579, in start_session
    super(Chrome, self).start_session(capabilities, browser_profile)
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/opt/conda/envs/bot_env/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: user data directory is already in use, please specify a unique value for --user-data-dir argument, or don't use --user-data-dir

Also additional logs from the selenium (last two entries for brevity):

[1628001570.898][INFO]: Launching chrome: /usr/bin/google-chrome --allow-pre-commit-input --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-infobars --disable-plugins-discovery --disable-popup-blocking --disable-prompt-on-repost --disable-setuid-sandbox --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --lang=en-US --log-level=0 --no-first-run --no-sandbox --no-service-autorun --password-store=basic --remote-debugging-host=127.0.0.1 --remote-debugging-port=49259 --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/tmp7b1nc7xd
[1628001571.926][INFO]: [c425997c3700079b1ce8c907274df291] RESPONSE InitSession ERROR invalid argument: user data directory is already in use, please specify a unique value for --user-data-dir argument, or don't use --user-data-dir

Interestingly enough, if add options.add_argument("--headless") to the setup options, the error disappears.

Do you know what it might be causing that @ultrafunkamsterdam ? Any hints/tips what I should look at? Any help will be greatly appreciated!

KacperKubara avatar Aug 03 '21 14:08 KacperKubara

I believe the issue is related to https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/217

KacperKubara avatar Aug 04 '21 07:08 KacperKubara

this

-        if not options.debugger_address:
-            options.debugger_address = debug_addr
+        #if not options.debugger_address:
+        #    options.debugger_address = debug_addr

makes chrome subprocess started by UC useless, the porpuse of v2 is attach chromedriver to a chrome instance not started by chromedriver, for that options.debugger_address must be set to an already running chrome instance. This is called remote chrome approach, it avoids several fingerprints that may be added if chrome be started by chromedriver.

You have 2 instances of chrome running, both set to use same user_data_dir, so probably that's the cause of user profile being already used.

HMaker avatar Aug 11 '21 15:08 HMaker

Interesting, so how do I make those 2 instances of Chrome use different user_data_dir, since there is only one argument for the user directory that is shared among those 2 processes?

KacperKubara avatar Aug 11 '21 16:08 KacperKubara

@KacperKubara I think you missed the point. UC v2 starts a chrome instance throught subprocess.Popen() for chromedriver to connect to it throught options.debugger_address. If you don't set debugger address, chromedriver itself will start another chrome instance and ignore the one started by UC v2. If you can't use subprocess started by v2, don't use v2, use v1:

import undetected_chromedriver as uc

driver = uc.Chrome()

HMaker avatar Aug 11 '21 17:08 HMaker

Thanks @HMaker , I will try it out after come back from holidays! If you happen to have a working dockerfile with some sample python script that you can share, it would be really helpful. I've been pulling out my hair on that issue since the past week 🤔

KacperKubara avatar Aug 11 '21 18:08 KacperKubara

I run it headless mode, but it seems you want to use virtual displays. As I suspected the problem was not in docker, subprocess is not an issue since docker allows multiple processes per container, the issue was wrong settings of UC v2. I use following Dockerfile for headless:

FROM python:3.8.11-slim-buster

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install chrome
RUN apt-get update \
    && apt-get install -y --no-install-recommends wget \
    && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
    && apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb \
    && rm google-chrome-stable_current_amd64.deb \
    && chown root.root -R /opt/google/chrome/ \
    && chmod 755 -R /opt/google/chrome/ \
    && chmod 4755 -R /opt/google/chrome/chrome-sandbox

# install python dependencies ...
RUN python -m venv /appenv
ENV VIRTUAL_ENV /appenv
ENV PATH /appenv/bin:$PATH
# maybe RUN pip install -r requirements.txt

# install chromedriver
ENV CHROMEDRIVER_PATH /appenv/chromedriver
RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='$CHROMEDRIVER_PATH', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()"
RUN chmod +x /appenv/chromedriver

Then on python you can use

import os
import undetected_chromedriver.v2 as uc

options = uc.ChromeOptions()
options.add_argument('--disable-gpu') # for headless
options.add_argument('--disable-dev-shm-usage') # uses /tmp for memory sharing
# disable popups on startup
options.add_argument('--no-first-run')
options.add_argument('--no-service-autorun')
options.add_argument('--no-default-browser-check')
options.add_argument('--password-store=basic')
# if you set options.user_data_dir make sure it won't be used by multiple chrome instances...
# if it's not set UC will create a random temp folder for it.
chrome = uc.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), headless=True, options=options)

If you experience chrome not reachable with non headless mode, see #268 on instructions of how to debug it.

HMaker avatar Aug 12 '21 15:08 HMaker

Thanks, i will try it out. I think that i actually managed to run chromedriver in headless mode. However, I need to run it in non-headless mode with some virtual screen since meet.google.com can detect it :( After i made headless mode work, non-headless optiom gives me this error. Has anyone encounteree that?

dockerfile_test1.py  dockerfile_test2.py  google-chrome-stable_current_amd64.deb
root@5b486aad4d13:/docker_tmp# python dockerfile_test2.py 
[876:925:0807/131757.864432:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[876:925:0807/131757.864475:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[922:922:0807/131757.865525:ERROR:gpu_init.cc(441)] Passthrough is not supported, GL is swiftshader
[876:925:0807/131757.875476:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[876:925:0807/131757.875516:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[876:989:0807/131757.909412:ERROR:object_proxy.cc(622)] Failed to call method: org.freedesktop.DBus.Properties.Get: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UPower was not provided by any .service files
[876:989:0807/131757.909553:ERROR:object_proxy.cc(622)] Failed to call method: org.freedesktop.UPower.GetDisplayDevice: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UPower was not provided by any .service files
[876:989:0807/131757.909690:ERROR:object_proxy.cc(622)] Failed to call method: org.freedesktop.UPower.EnumerateDevices: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UPower was not provided by any .service files
[0807/131802.723241:ERROR:nacl_helper_linux.cc(307)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox 

KacperKubara avatar Aug 12 '21 18:08 KacperKubara

@KacperKubara pass options.add_argument("--no-sandbox") because UC v2 sets that option only for headless mode (wonder why). I tried but could not find a way to run chrome with sandbox enabled, it's bad to disable it, it's a workaround. I tried this to set up sandbox https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/linux/suid_sandbox_development.md

HMaker avatar Aug 12 '21 18:08 HMaker

I run it headless mode, but it seems you want to use virtual displays. As I suspected the problem was not in docker, subprocess is not an issue since docker allows multiple processes per container, the issue was wrong settings of UC v2. I use following Dockerfile for headless:

FROM python:3.8.11-slim-buster

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install chrome
RUN apt-get update \
    && apt-get install -y --no-install-recommends wget \
    && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
    && apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb \
    && rm google-chrome-stable_current_amd64.deb \
    && chown root.root -R /opt/google/chrome/ \
    && chmod 755 -R /opt/google/chrome/ \
    && chmod 4755 -R /opt/google/chrome/chrome-sandbox

# install python dependencies ...
RUN python -m venv /appenv
ENV VIRTUAL_ENV /appenv
ENV PATH /appenv/bin:$PATH
# maybe RUN pip install -r requirements.txt

# install chromedriver
ENV CHROMEDRIVER_PATH /appenv/chromedriver
RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='$CHROMEDRIVER_PATH', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()"
RUN chmod +x /appenv/chromedriver

Then on python you can use

import os
import undetected_chromedriver.v2 as uc

options = uc.ChromeOptions()
options.add_argument('--disable-gpu') # for headless
options.add_argument('--disable-dev-shm-usage') # uses /tmp for memory sharing
# disable popups on startup
options.add_argument('--no-first-run')
options.add_argument('--no-service-autorun')
options.add_argument('--no-default-browser-check')
options.add_argument('--password-store=basic')
# if you set options.user_data_dir make sure it won't be used by multiple chrome instances...
# if it's not set UC will create a random temp folder for it.
chrome = uc.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), headless=True, options=options)

If you experience chrome not reachable with non headless mode, see #268 on instructions of how to debug it.

Hi,

Do you have a version for arm64 ? i search on internet but impossible to find a google chrome stable version for arm64 in .deb

jeromegallego68 avatar Nov 04 '21 22:11 jeromegallego68

@jeromegallego68 Google does not builds Chrome for ARM64, except for Android phones. All you can do is build chromium for ARM64, you can find it in Ubuntu repositories as chromium-browser package. Chrome is closed source, chromium is its open-sourced core.

HMaker avatar Nov 05 '21 15:11 HMaker

@HMaker Thank's, your answer confirm what i found in my research.

Finally i decided to emulate my docker container with amd64 (i have a arm64 processor on my local machine) and it's seems good but it cannot find the chromedriver.

Here is the error :

------                                                                                                                                                                            
 > [7/9] RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='/appenv/chromedriver', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()":
#11 2.526 Traceback (most recent call last):
#11 2.526   File "<string>", line 1, in <module>
#11 2.527   File "/appenv/lib/python3.8/site-packages/undetected_chromedriver/patcher.py", line 91, in auto
#11 2.529     ispatched = self.is_binary_patched(self.executable_path)
#11 2.529   File "/appenv/lib/python3.8/site-packages/undetected_chromedriver/patcher.py", line 209, in is_binary_patched
#11 2.529     with io.open(executable_path, "rb") as fh:
#11 2.529 FileNotFoundError: [Errno 2] No such file or directory: '/appenv/chromedriver'
------

Do you have an idea how to fix it ? i guess i'm really close to succeeded

jeromegallego68 avatar Nov 08 '21 10:11 jeromegallego68

@HMaker here is my Dockerfile, maybe something is wrong here

# As Scrapy runs on Python, I choose the official Python 3 Docker image.
FROM --platform=linux/amd64 python:3.8.11-slim-buster

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install chrome
RUN apt-get update \
    && apt-get install -y --no-install-recommends wget \
    && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
    && apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb \
    && rm google-chrome-stable_current_amd64.deb \
    && chown root.root -R /opt/google/chrome/ \
    && chmod 755 -R /opt/google/chrome/ \
    && chmod 4755 -R /opt/google/chrome/chrome-sandbox

# install python dependencies ...
RUN python -m venv /appenv
RUN /appenv/bin/python -m pip install --upgrade pip
ENV VIRTUAL_ENV /appenv
ENV PATH /appenv/bin:$PATH
# maybe RUN pip install -r requirements.txt

# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./
 
# Install Scrapy specified in requirements.txt.
RUN pip install -r requirements.txt

# install chromedriver
ENV CHROMEDRIVER_PATH /appenv/chromedriver
RUN python -c "from undetected_chromedriver.patcher import Patcher; patcher = Patcher(executable_path='$CHROMEDRIVER_PATH', version_main=$(google-chrome --version | grep -Po ' [0-9]{1,3}\.' | sed 's:^.\(.*\).$:\1:')); assert patcher.auto()"
RUN chmod +x /appenv/chromedriver

# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .
 
# Run the crawler when the container launches.
CMD [ "python3", "./go-spider.py" ]

jeromegallego68 avatar Nov 08 '21 15:11 jeromegallego68

Wondering If anyone tried the remote. chrome function and was successful.

Anticope12 avatar Jun 10 '22 18:06 Anticope12