helpers icon indicating copy to clipboard operation
helpers copied to clipboard

Docker fails due to missing Docker socket

Open aangelo9 opened this issue 8 months ago • 33 comments

When trying to run invoke commands, i ran into:

# invoke print_env
12:02:00 - INFO  hdbg.py init_logger:1018                               > cmd='/venv/bin/invoke print_env'
12:02:00 - WARN  hserver.py _raise_invalid_host:777                     Don't recognize host: host_os_name=Linux, am_host_os_name=None
[sudo] password for ubuntu: 
sudo: a password is required
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Run 'docker run --help' for more information
[sudo] password for ubuntu: 
sudo: a password is required
Traceback (most recent call last):
  File "/venv/bin/invoke", line 8, in <module>
    sys.exit(program.run())
             ^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/invoke/program.py", line 398, in run
    self.execute()
  File "/venv/lib/python3.12/site-packages/invoke/program.py", line 583, in execute
    executor.execute(*self.tasks)
  File "/venv/lib/python3.12/site-packages/invoke/executor.py", line 140, in execute
    result = call.task(*args, **call.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/invoke/tasks.py", line 138, in __call__
    result = self.body(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/lib_tasks_print.py", line 92, in print_env
    henv.env_to_str(
  File "/app/helpers/henv.py", line 543, in env_to_str
    msg += get_system_signature()[0] + "\n"
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/henv.py", line 505, in get_system_signature
    txt_tmp = hserver.get_docker_info()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/hserver.py", line 683, in get_docker_info
    docker_needs_sudo_ = docker_needs_sudo()
                         ^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/hserver.py", line 631, in docker_needs_sudo
    assert False, "Failed to run docker"
           ^^^^^
AssertionError: Failed to run docker

This happened after commit b36a90c on masters.

In helpers/hserver.py, docker_needs_sudo enforces that sudo is required.

# Taken from `helpers/hserver.py`
def docker_needs_sudo() -> bool:
    """
    Return whether Docker commands need to be run with sudo.
    """
    if not has_docker():
        return False
    # Another way to check is to see if your user is in the docker group:
    # > groups | grep docker
    rc = os.system("docker run hello-world 2>&1 >/dev/null")
    if rc == 0:
        return False
    #
    rc = os.system("sudo docker run hello-world 2>&1 >/dev/null")
    if rc == 0:
        return True
    assert False, "Failed to run docker"

Previously, when [sudo] password for ubuntu: is prompted, we could skip it with ctrl+c, and continue on. But with the new logic, sudo is a must or it raises an assertion early.

FYI: @sonniki @gpsaggese

aangelo9 avatar Apr 16 '25 16:04 aangelo9

Good decision to file an issue @aangelo9 . We need to bypass this check for people not working on the server by inserting hserver.is_external_dev() somewhere, probably upstream from this function. @aangelo9 can you investigate a bit and make a proposal (in a PR) where would be the most fitting place to put it?

sonniki avatar Apr 16 '25 19:04 sonniki

Understood. I will draft a PR.

aangelo9 avatar Apr 16 '25 19:04 aangelo9

I've been rewriting all that logic and I can't test on all the external devices.

IIUC, my take is that either one can run docker without sudo or if it needs sudo, it needs to be password-less. We call docker from everywhere in the codebase and one can't skip or enter the password every single time.

So my solution is to ask contributors to:

  1. add their users to the sudoers
  2. make sudo password less This is the set up we have everywhere.

This means that we should update the documentation and not change the code.

I can help document how to improve the documentation.

Corollary: the problem is that we ignored a problem in the set-up since it was new and then this problem came back to bite us.

gpsaggese avatar Apr 17 '25 19:04 gpsaggese

So my solution is to ask contributors to: add their users to the sudoers make sudo password less This is the set up we have everywhere. This means that we should update the documentation and not change the code. I can help document how to improve the documentation.

Let's do that then, only let's try to give this priority since it's blocking some interns from running Linter and other invokes that require Docker. Unfortunately, I don't have enough understanding of how this issue should be solved in the setup to be able to guide here.

sonniki avatar Apr 17 '25 20:04 sonniki

I have done abit more investigating and I think that the current code does not check for Linux VM. I have done:

# Add user to Docker group
sudo usermod -aG docker $USER

# Add user to sudoers
sudo visudo
# Add at the bottom $USER ALL=(ALL) NOPASSWD:ALL

I used this temp fix to try to access the container:

def docker_needs_sudo() -> bool:
    """
    Return whether Docker commands need to be run with sudo.
    """
    if os.path.exists("/.dockerenv"):
        # We're inside a Docker container — skip check
        return False
    ...

I found out that for Linux, the Docker daemon socket is never mounted, hence DinD check fails.

/var/run/docker.sock:/var/run/docker.sock

rc = os.system("docker run hello-world 2>&1 >/dev/null")
     if rc == 0:
         return False
#
rc = os.system("sudo docker run hello-world 2>&1 >/dev/null")
    if rc == 0:
        return True

When in the Docker container, calling Docker will raise this error:

root@4ce53e2efec2:/app# docker images ls
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

aangelo9 avatar Apr 18 '25 15:04 aangelo9

Ok to put a hack to unblock, but let's clearly mark with a TODO(gp): Remove this as per HelpersTask578.

  1. What is your set up exactly?

  2. What does it mean "the current code does not check for Linux VM"?

  3. Adding users to sudoers is correct. Do you know if we have those instructions in the setup? If not, we should add

  4. There are two ways to run a Docker container when inside Docker: sibling and child containers. I don't understand why /var/run/docker.sock is not mounted, which is required for sibling containers.

@samarth9008 and @heanhsok do you have more insights since you are Linux users?

gpsaggese avatar Apr 19 '25 13:04 gpsaggese

  1. Current Env Setup:
    • Windows 11 OS
    • VMWare Workstation Pro
    • Ubuntu x64 (22.04)

Or in codebase variables:

WARN  hserver.py _raise_invalid_host:790                     Don't recognize host: host_os_name=Linux, am_host_os_name=None
  1. From what I understand, lib_tasks_docker.py creates the tmp.docker-compose.yml file that configures the Docker container. However, under _generate_docker_compose_file():
# Taken from helpers/lib_tasks_docker.py
def _generate_docker_compose_file()
    ...
    if use_sibling_container:
        # Use sibling-container approach.
        base_app_spec["volumes"].append(
            "/var/run/docker.sock:/var/run/docker.sock"
        )
    ...

use_sibling_container is a bool variable from hserver.use_docker_sibling_containers():

# Taken from helpers/hserver.py
def use_docker_sibling_containers() -> bool:
    """
    Return whether to use Docker sibling containers.

    Using sibling containers requires that all Docker containers in the
    same network so that they can communicate with each other.
    """
    val = is_dev4() or _is_mac_version_with_sibling_containers()
    return val

Where is_dev4() checks for interval devs, and _is_mac_version_with_sibling_containers() checks for mac. Hence why it does not check for Linux or if it's an external dev and /var/run/docker.sock does not get mounted.

  1. Current setup does not ask users to add themselves to sudoers. I could update the document.

  2. There are 2 approaches to this:

  • This enables running basic invoke functions but does not support DinD:
def docker_needs_sudo() -> bool:
    """
    Return whether Docker commands need to be run with sudo.
    """
    if os.path.exists("/.dockerenv"):
        # We're inside a Docker container — skip check
        return False
  • The other approach is to maybe place hserver.is_external_dev() into hserver.use_docker_sibling_container() as an additional condition to get /var/run/docker.sock mounted if DinD is absolutely necessary for interns.

aangelo9 avatar Apr 19 '25 20:04 aangelo9

Current setup does not ask users to add themselves to sudoers. I could update the document.

Yes pls

The new approach I've been working on is about checking what functionalities are actually available in the system, rather than checking what computer is "interns" vs "mac" vs "dev" and then have a table that says "for this type of set-up, this is what we have". I'm ok with adding some hacks to keep working but they need to be clearly documented, so that we can remove them.

gpsaggese avatar Apr 20 '25 14:04 gpsaggese

Understood, I will add some hacks and mark them with TODO(gp): Remove this as per HelpersTask578 and update the setup document.

aangelo9 avatar Apr 21 '25 15:04 aangelo9

Current hack is:

  • Run Docker container as root user for external linux users.
  • Skip DinD for external linux users.

Problem:

  • For Ubuntu, the user inside the container is "ubuntu" and does not have read or write permissions, which prevents linter from working.
  • The "ubuntu" user is not in the sudoers file, and adding it requires modifying the Docker image.

aangelo9 avatar Apr 22 '25 17:04 aangelo9

You can file an issue for this. It's a bit weird that there is a problem, since we use ubuntu on the server and everything is fine.

Maybe the ubuntu user on your user has a different id than the one on the server. In that case, the fix is adding the id of the ubuntu user to our Docker containers.

@heanhsok any idea about this?

gpsaggese avatar Apr 23 '25 22:04 gpsaggese

12:02:00 - INFO hdbg.py init_logger:1018 > cmd='/venv/bin/invoke print_env' 12:02:00 - WARN hserver.py _raise_invalid_host:777 Don't recognize host: host_os_name=Linux, am_host_os_name=None [sudo] password for ubuntu: sudo: a password is required docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Let's rewind a bit. It looks like the errors start from here.

Is this error coming from when running on host or inside the container? Could you try both and share the output? Please make sure to clear all the hacks first and use only the code from master.

  • To run on host,
i print_env
  • To run in container,
i docker_bash
...
/venv/bin/invoke print_env

If neither succeed, can you try running the following commands and share the output?

> heanhs@dev1:~/src/cmamp2$ echo $USER
heanhs
> heanhs@dev1:~/src/cmamp2$ echo $UID
1042
> heanhs@dev1:~/src/cmamp2$ groups $USER
heanhs : docker
> heanhs@dev1:~/src/cmamp2$ which docker
/usr/bin/docker
> heanhs@dev1:~/src/cmamp2$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 Apr  6 02:06 /var/run/docker.sock

@aangelo9

heanhsok avatar Apr 24 '25 05:04 heanhsok

  • Output for i print_env.
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ i print_env
12:46:00 - INFO  hdbg.py init_logger:1018                               > cmd='/home/alvinoangelo/src/venv/client_venv.helpers/bin/invoke print_env'
12:46:00 - WARN  hserver.py _raise_invalid_host:782                     Don't recognize host: host_os_name=Linux, am_host_os_name=None
12:46:00 - WARN  henv.py _get_psutil_info:372                           psutil is not installed: No module named 'psutil'
# Repo config
  get_host_name='github.com'
  get_html_dir_to_url_mapping='{'s3://cryptokaizen-html': 'http://172.30.2.44', 's3://cryptokaizen-html/v2': 'http://172.30.2.44/v2'}'
  get_invalid_words='[]'
  get_docker_base_image_name='helpers'
# Server config
  enable_privileged_mode='False'
  get_docker_shared_group=''
  get_docker_user=''
  get_host_user_name='alvinoangelo'
  get_shared_data_dirs='None'
  has_dind_support='False'
  has_docker_sudo='True'
  is_AM_S3_available='True'
  is_CK_S3_available='True'
  is_dev4='False'
  is_dev_csfy='False'
  is_external_linux='True'
  is_host_mac='False'
  is_ig_prod='False'
  is_inside_ci='False'
  is_inside_docker='False'
  is_inside_ecs_container='False'
  is_inside_unit_test='False'
  is_prod_csfy='False'
  run_docker_as_root='False'
  skip_submodules_test='False'
  use_docker_db_container_name_to_connect='False'
  use_docker_network_mode_host='False'
  use_docker_sibling_containers='False'
  use_main_network='False'
# System signature
  # Container version
    container_version='None'
    changelog_version='1.2.0'
  # Git info
    branch_name='master'
    hash='32f2268'
    # Last commits:
      * 32f2268 Sonya Nikiforova HelpersTask393: Rename doc (#610)                                 (   8 hours ago) Thu Apr 24 05:11:39 2025  (HEAD -> master, origin/master, origin/HEAD)
      * 7d8baea aangelo9 HelpersTask393_Review_systems_to_automate_code_review (#604)      (   9 hours ago) Thu Apr 24 03:53:46 2025           
      * fe23e50 Sandeep Thalapanane HelpersTask596_Links_are_incorrectly_converted_inside_fenced_blocks (#608) (  21 hours ago) Wed Apr 23 15:44:53 2025           
  # Platform info
    system=Linux
    node name=alvinoangelo
    release=6.8.0-57-generic
    version=#59~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 19 17:07:41 UTC 2
    machine=x86_64
    processor=x86_64
  # psutils info
    psutil is not installed
  # Docker info
    has_docker=True
    docker_version='28.1.1'
    docker_needs_sudo=False
    has_privileged_mode=True
    is_inside_docker=False
    has_sibling_containers_support=*undef*
    has_docker_dind_support=*undef*
  # Packages
    python: 3.10.12
    cvxopt: ?
    cvxpy: ?
    gluonnlp: ?
    gluonts: ?
    joblib: ?
    mxnet: ?
    numpy: 2.2.4
    pandas: 2.2.3
    pyarrow: ?
    scipy: ?
    seaborn: ?
    sklearn: ?
    statsmodels: ?
# Env vars
  CSFY_AWS_ACCESS_KEY_ID=undef
  CSFY_AWS_DEFAULT_REGION=undef
  CSFY_AWS_S3_BUCKET='cryptokaizen-data'
  CSFY_AWS_SECRET_ACCESS_KEY=undef
  CSFY_AWS_SESSION_TOKEN=undef
  CSFY_CI=undef
  CSFY_ECR_BASE_PATH='causify'
  CSFY_ENABLE_DIND=undef
  CSFY_FORCE_TEST_FAIL=undef
  CSFY_HOST_NAME='alvinoangelo'
  CSFY_HOST_OS_NAME='Linux'
  CSFY_HOST_USER_NAME='alvinoangelo'
  CSFY_HOST_VERSION=undef
  CSFY_REPO_CONFIG_CHECK=undef
  CSFY_REPO_CONFIG_PATH=undef
  GH_ACTION_ACCESS_TOKEN=undef

  • Output for i docker_bash, I was not able to enter the container.
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ i docker_bash
12:46:43 - INFO  hdbg.py init_logger:1018                               > cmd='/home/alvinoangelo/src/venv/client_venv.helpers/bin/invoke docker_bash'
# docker_bash: base_image='', stage='dev', version='', use_entrypoint=True, as_user=True, generate_docker_compose_file=True, container_dir_name='.', skip_pull=False, skip_docker_image_compatibility_check=False
12:46:43 - WARN  hserver.py _raise_invalid_host:782                     Don't recognize host: host_os_name=Linux, am_host_os_name=None
# docker_pull: stage='dev', version=None, skip_pull=False
# docker_login: target_registry='aws_ecr.ck'
12:46:44 - WARN  lib_tasks_docker.py docker_login:405                   Skipping Docker login process for Helpers or Tutorials
12:46:44 - INFO  lib_tasks_docker.py _docker_pull:230                   image='causify/helpers:dev'
docker pull causify/helpers:dev
dev: Pulling from causify/helpers
Digest: sha256:43ac049013f992d7efc4a8196bfa15dc0b3f7559e52848adf825c3c7b5c84ca3
Status: Image is up to date for causify/helpers:dev
docker.io/causify/helpers:dev
IMAGE=causify/helpers:dev \
        docker compose \
        --file /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name alvinoangelo.helpers.app.helpers1.20250424_124644 \
        --user $(id -u):$(id -g) \
        app \
        bash 
WARN[0000] The "CSFY_FORCE_TEST_FAIL" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_SESSION_TOKEN" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_TELEGRAM_TOKEN" variable is not set. Defaulting to a blank string. 
WARN[0000] The "OPENAI_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
##> devops/docker_run/entrypoint.sh
UID=1000
GID=1000
CSFY_USE_HELPERS_AS_NESTED_MODULE=0
CSFY_HOST_GIT_ROOT_PATH=/home/alvinoangelo/src/helpers1
CSFY_GIT_ROOT_PATH=/app
CSFY_HELPERS_ROOT_PATH=/app
> source /app/dev_scripts_helpers/thin_client/thin_client_utils.sh ...
AM_CONTAINER_VERSION='1.2.0'
CSFY_USE_HELPERS_AS_NESTED_MODULE=0
##> devops/docker_run/docker_setenv.sh
> source /app/dev_scripts_helpers/thin_client/thin_client_utils.sh ...
# activate_docker_venv()
# set_path()
PATH=.:./.github:./devops:./helpers:./.vscode:./.git:./papers:./dev_scripts_helpers:./.mypy_cache:./config_root:./docs:./import_check:./linters::/app:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# set_up_docker_git()
git --version: git version 2.43.0
/app
# set_pythonpath()
Adding /app to PYTHONPATH
PYTHONPATH=/app:
# Configure env
WARNING: /var/run/docker.sock doesn't exist
# set_up_docker_git()
git --version: git version 2.43.0
/app
# invoke print_env
12:46:46 - INFO  hdbg.py init_logger:1018                               > cmd='/venv/bin/invoke print_env'
12:46:46 - WARN  hserver.py _raise_invalid_host:782                     Don't recognize host: host_os_name=Linux, am_host_os_name=None
[sudo] password for ubuntu: 
sudo: a password is required
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Run 'docker run --help' for more information
[sudo] password for ubuntu: 
sudo: a password is required
Traceback (most recent call last):
  File "/venv/bin/invoke", line 8, in <module>
    sys.exit(program.run())
             ^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/invoke/program.py", line 398, in run
    self.execute()
  File "/venv/lib/python3.12/site-packages/invoke/program.py", line 583, in execute
    executor.execute(*self.tasks)
  File "/venv/lib/python3.12/site-packages/invoke/executor.py", line 140, in execute
    result = call.task(*args, **call.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/invoke/tasks.py", line 138, in __call__
    result = self.body(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/lib_tasks_print.py", line 92, in print_env
    henv.env_to_str(
  File "/app/helpers/henv.py", line 543, in env_to_str
    msg += get_system_signature()[0] + "\n"
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/henv.py", line 505, in get_system_signature
    txt_tmp = hserver.get_docker_info()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/hserver.py", line 688, in get_docker_info
    docker_needs_sudo_ = docker_needs_sudo()
                         ^^^^^^^^^^^^^^^^^^^
  File "/app/helpers/hserver.py", line 636, in docker_needs_sudo
    assert False, "Failed to run docker"
           ^^^^^
AssertionError: Failed to run docker
  • Other outputs.
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ echo $USER
alvinoangelo
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ echo $UID
1000
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ groups $USER
alvinoangelo : alvinoangelo adm cdrom sudo dip plugdev lpadmin lxd sambashare docker
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ which docker
/home/alvinoangelo/bin/docker
(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 Apr 24 10:32 /var/run/docker.sock

aangelo9 avatar Apr 24 '25 16:04 aangelo9

Can u share your devops/compose/tmp.docker-compose.yml as well? I suspect this var CSFY_ENABLE_DIND is set to 0

heanhsok avatar Apr 24 '25 17:04 heanhsok

CSFY_ENABLE_DIND is set to 0.

version: '3'
services:
  base_app:
    cap_add:
      - SYS_ADMIN
    environment:
      - CSFY_ENABLE_DIND=0
      - CSFY_FORCE_TEST_FAIL=$CSFY_FORCE_TEST_FAIL
      - CSFY_HOST_NAME=alvinoangelo
      - CSFY_HOST_OS_NAME=Linux
      - CSFY_HOST_USER_NAME=alvinoangelo
      - CSFY_HOST_VERSION=6.8.0-57-generic
      - CSFY_REPO_CONFIG_CHECK=True
      - CSFY_REPO_CONFIG_PATH=
      - CSFY_AWS_ACCESS_KEY_ID=$CSFY_AWS_ACCESS_KEY_ID
      - CSFY_AWS_DEFAULT_REGION=$CSFY_AWS_DEFAULT_REGION
      - CSFY_AWS_PROFILE=$CSFY_AWS_PROFILE
      - CSFY_AWS_S3_BUCKET=$CSFY_AWS_S3_BUCKET
      - CSFY_AWS_SECRET_ACCESS_KEY=$CSFY_AWS_SECRET_ACCESS_KEY
      - CSFY_AWS_SESSION_TOKEN=$CSFY_AWS_SESSION_TOKEN
      - CSFY_ECR_BASE_PATH=$CSFY_ECR_BASE_PATH
      - CSFY_HOST_GIT_ROOT_PATH=/home/alvinoangelo/src/helpers1
      - CSFY_GIT_ROOT_PATH=/app
      - CSFY_HELPERS_ROOT_PATH=/app
      - CSFY_USE_HELPERS_AS_NESTED_MODULE=0
      - CSFY_TELEGRAM_TOKEN=$CSFY_TELEGRAM_TOKEN
      - CSFY_CI=$CSFY_CI
      - OPENAI_API_KEY=$OPENAI_API_KEY
      - GH_ACTION_ACCESS_TOKEN=$GH_ACTION_ACCESS_TOKEN
      - GH_TOKEN=$GH_ACTION_ACCESS_TOKEN
    image: ${IMAGE}
    restart: 'no'
    volumes:
      - ~/.aws:/home/.aws
      - ~/.config/gspread_pandas/:/home/.config/gspread_pandas/
      - ~/.config/gh:/home/.config/gh
      - ~/.ssh:/home/.ssh
  app:
    extends: base_app
    volumes:
      - /home/alvinoangelo/src/helpers1:/app
    working_dir: /app
  linter:
    extends: base_app
    volumes:
      - /home/alvinoangelo/src/helpers1:/src
      - ../../:/app
    working_dir: /src
    environment:
      - MYPYPATH
  jupyter_server:
    command: devops/docker_run/run_jupyter_server.sh
    environment:
      - PORT=${PORT}
    extends: app
    network_mode: ${NETWORK_MODE:-bridge}
    ports:
      - ${PORT}:${PORT}
  jupyter_server_test:
    command: jupyter notebook -h 2>&1 >/dev/null
    environment:
      - PORT=${PORT}
    extends: app
    network_mode: ${NETWORK_MODE:-bridge}
    ports:
      - ${PORT}:${PORT}

aangelo9 avatar Apr 24 '25 17:04 aangelo9

CSFY_ENABLE_DIND is set to 0.

It makes sense. I think this is where the problem is.

Could you try adding another clause here to allow external dev to use privilege mode ? and rerurn the i docker_bash

elif is_external_linux():
  ret = True

https://github.com/causify-ai/helpers/blob/32f2268843359438bc9adcdd1124f4e05ab019b1/helpers/hserver.py#L784-L814

heanhsok avatar Apr 24 '25 17:04 heanhsok

I've tried that approach, but it still asks for sudo password and stays stuck when ctrl+c.

(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ i docker_bash
13:44:10 - INFO  hdbg.py init_logger:1018                               > cmd='/home/alvinoangelo/src/venv/client_venv.helpers/bin/invoke docker_bash'
# docker_bash: base_image='', stage='dev', version='', use_entrypoint=True, as_user=True, generate_docker_compose_file=True, container_dir_name='.', skip_pull=False, skip_docker_image_compatibility_check=False
# docker_pull: stage='dev', version=None, skip_pull=False
# docker_login: target_registry='aws_ecr.ck'
13:44:10 - WARN  lib_tasks_docker.py docker_login:405                   Skipping Docker login process for Helpers or Tutorials
13:44:10 - INFO  lib_tasks_docker.py _docker_pull:230                   image='causify/helpers:dev'
docker pull causify/helpers:dev
dev: Pulling from causify/helpers
Digest: sha256:43ac049013f992d7efc4a8196bfa15dc0b3f7559e52848adf825c3c7b5c84ca3
Status: Image is up to date for causify/helpers:dev
docker.io/causify/helpers:dev
IMAGE=causify/helpers:dev \
        docker compose \
        --file /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name alvinoangelo.helpers.app.helpers1.20250424_134410 \
        --user $(id -u):$(id -g) \
        app \
        bash 
WARN[0000] The "CSFY_FORCE_TEST_FAIL" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_SESSION_TOKEN" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_TELEGRAM_TOKEN" variable is not set. Defaulting to a blank string. 
WARN[0000] The "OPENAI_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
##> devops/docker_run/entrypoint.sh
UID=1000
GID=1000
CSFY_USE_HELPERS_AS_NESTED_MODULE=0
CSFY_HOST_GIT_ROOT_PATH=/home/alvinoangelo/src/helpers1
CSFY_GIT_ROOT_PATH=/app
CSFY_HELPERS_ROOT_PATH=/app
> source /app/dev_scripts_helpers/thin_client/thin_client_utils.sh ...
AM_CONTAINER_VERSION='1.2.0'
CSFY_USE_HELPERS_AS_NESTED_MODULE=0
##> devops/docker_run/docker_setenv.sh
> source /app/dev_scripts_helpers/thin_client/thin_client_utils.sh ...
# activate_docker_venv()
# set_path()
PATH=.:./.github:./devops:./helpers:./.vscode:./.git:./papers:./dev_scripts_helpers:./.mypy_cache:./config_root:./docs:./import_check:./linters::/app:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# set_up_docker_git()
git --version: git version 2.43.0
/app
# set_pythonpath()
Adding /app to PYTHONPATH
PYTHONPATH=/app:
# Configure env
# set_up_docker_in_docker()
[sudo] password for ubuntu: 
[sudo] password for ubuntu: sudo: a password is required

Could it be because it uses docker image causify/helpers:dev, instead of causify/helpers:prod since I'm working on the helpers repo.

aangelo9 avatar Apr 24 '25 17:04 aangelo9

causify/helpers:dev is correct because you're not running linter. Wierd is thing is that it should not ask for pw. Also not sure why this is the same?

##> devops/docker_run/entrypoint.sh
UID=1000
GID=1000

can u try?

getent group 1000

heanhsok avatar Apr 24 '25 18:04 heanhsok

getent group 1000 output:

(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ getent group 1000
alvinoangelo:x:1000:

aangelo9 avatar Apr 24 '25 18:04 aangelo9

Oh wait i think the dind is still not set up from your above log. There should be this set_up_docker_in_docker step in the log. Can you trace back to variable that caused this part to skip? Is CSFY_ENABLE_DIND set to 1 now?

Example log when it is setup.

# set_up_docker_git()
git --version: git version 2.43.0
/app
# set_pythonpath()
Adding /app/helpers_root to PYTHONPATH
Adding /app to PYTHONPATH
PYTHONPATH=/app:/app/helpers_root:
# Configure env
# set_up_docker_in_docker()
{ "storage-driver": "vfs" }
 * Starting Docker: docker                                                                                                                         [ OK ]
 * Docker is running
Waiting for /var/run/docker.sock to be created.
Permissions for /var/run/docker.sock have been changed.
Setting sudo docker permissions
srw-rw-rw- 1 root docker 0 Apr 24 18:16 /var/run/docker.sock
srw-rw-rw- 1 root docker 0 Apr 24 18:16 /var/run/docker.sock
# set_up_docker_git()
git --version: git version 2.43.0
/app
# invoke print_env

heanhsok avatar Apr 24 '25 18:04 heanhsok

Yes, CSFY_ENABLE_DIND = 1 now. I figured out that this line was the one making the issue.

sudo echo '{ "storage-driver": "vfs" }' | sudo tee -a /etc/docker/daemon.json

https://github.com/causify-ai/helpers/blob/32f2268843359438bc9adcdd1124f4e05ab019b1/dev_scripts_helpers/thin_client/thin_client_utils.sh#L325-L367

aangelo9 avatar Apr 24 '25 18:04 aangelo9

What happen when u were running that command on your host? does it ask for pw?

sudo echo '{ "storage-driver": "vfs" }' | sudo tee -a /etc/docker/daemon.json

heanhsok avatar Apr 24 '25 18:04 heanhsok

No it does not ask for password when run on host.

It also mounted successfully.

(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ cat /etc/docker/daemon.json
{ "storage-driver": "vfs" }

aangelo9 avatar Apr 24 '25 19:04 aangelo9

I see. ATM i'm not sure what's causing it yet. I have been using in on my mac and Linux on the server with no issue. I'll try to run in on my Ubuntu running on VM software and see if I can reproduce it.

heanhsok avatar Apr 24 '25 19:04 heanhsok

Problem

Wierd is thing is that it should not ask for pw.

I have run it on an Ubuntu running on VM software and had that same issue.

  • The problem is that the first user from the VM uses the same user id of 1000 as the ubuntu user in the docker container
  • We have added the user_1000 (with user id 1000) to the etc_sudoers but user ubuntu is not
  • As checked, user_1000 with id 1000 was not even created (probably because the id is already occupied by the ubuntu user)
root@6934d3ed9bda:/app# getent passwd
ubuntu:x:1000:1000:Ubuntu:/home/ubuntu:/bin/bash
...
user_501:x:501:1001::/home:/bin/sh
user_1001:x:1001:1002::/home:/bin/sh
user_1002:x:1002:1003::/home:/bin/sh
...
  • So when the container starts, this user id 1000 corresponding to the ubuntu (which is not the etc_sudoers file) is used
IMAGE=causify/helpers:dev \
        docker compose \
        --file /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml \
        ...
        --user $(id -u):$(id -g) 
  • That's why it keeps asking for pw when u shouldn't
  • I guess we haven't had this issue on our dev server before because our users have user id of > 1000

Testing

@aangelo9

Just to test, you can try hard coding this value to 1001, and the invoke bash should work. (Please keep the is_external_linux condition in the enable_privileged_mode() that u did above so that the CSFY_ENABLE_DIND is set to 1)

if as_user:                      
        docker_cmd_.append(
            r"""                                                    
        --user 1001:$(id -g)"""
        )
  • https://github.com/causify-ai/helpers/blob/5ff3dc086faa689a2635494f906a7f86e1f100e4/helpers/lib_tasks_docker.py#L1261C1-L1265C10

Solution

  • We can add the ubuntu to the no pwd check list in the etc_sudoers file and rebuild the image (since this is probably not gonna change as we're using the ubuntu base image)
# Linux users.
ubuntu ALL=(ALL) NOPASSWD:ALL
user_1000 ALL=(ALL) NOPASSWD:ALL
  • I created another user and got the user id of 1001 and things work. (A bit inconvenience if everyone has to do it but i did it just to test it out)
  • We can also change the user id to a different number (i.e. 1001) and transfer ownership of all the files to the new user id (although it can be a bit destructive if not done carefully)

WDYT? @gpsaggese @sonniki

heanhsok avatar Apr 24 '25 23:04 heanhsok

I ran into a ulimit error when testing:

(client_venv.helpers) alvinoangelo@alvinoangelo:~/src/helpers1$ i docker_bash
21:24:03 - INFO  hdbg.py init_logger:1018                               > cmd='/home/alvinoangelo/src/venv/client_venv.helpers/bin/invoke docker_bash'
# docker_bash: base_image='', stage='dev', version='', use_entrypoint=True, as_user=True, generate_docker_compose_file=True, container_dir_name='.', skip_pull=False, skip_docker_image_compatibility_check=False
# docker_pull: stage='dev', version=None, skip_pull=False
# docker_login: target_registry='aws_ecr.ck'
21:24:03 - WARN  lib_tasks_docker.py docker_login:405                   Skipping Docker login process for Helpers or Tutorials
21:24:03 - INFO  lib_tasks_docker.py _docker_pull:230                   image='causify/helpers:dev'
docker pull causify/helpers:dev
dev: Pulling from causify/helpers
Digest: sha256:43ac049013f992d7efc4a8196bfa15dc0b3f7559e52848adf825c3c7b5c84ca3
Status: Image is up to date for causify/helpers:dev
docker.io/causify/helpers:dev
IMAGE=causify/helpers:dev \
        docker compose \
        --file /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name alvinoangelo.helpers.app.helpers1.20250424_212403 \
        --user 1001:$1001 \
        app \
        bash 
WARN[0000] The "CSFY_FORCE_TEST_FAIL" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_DEFAULT_REGION" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_AWS_SESSION_TOKEN" variable is not set. Defaulting to a blank string. 
WARN[0000] The "CSFY_TELEGRAM_TOKEN" variable is not set. Defaulting to a blank string. 
WARN[0000] The "OPENAI_API_KEY" variable is not set. Defaulting to a blank string. 
WARN[0000] /home/alvinoangelo/src/helpers1/devops/compose/tmp.docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
##> devops/docker_run/entrypoint.sh
UID=1001
GID=1
CSFY_USE_HELPERS_AS_NESTED_MODULE=0
CSFY_HOST_GIT_ROOT_PATH=/home/alvinoangelo/src/helpers1
CSFY_GIT_ROOT_PATH=/app
CSFY_HELPERS_ROOT_PATH=/app
> source /app/dev_scripts_helpers/thin_client/thin_client_utils.sh ...
AM_CONTAINER_VERSION='1.2.0'
CSFY_USE_HELPERS_AS_NESTED_MODULE=0
##> devops/docker_run/docker_setenv.sh
> source /app/dev_scripts_helpers/thin_client/thin_client_utils.sh ...
# activate_docker_venv()
# set_path()
PATH=.:./.github:./devops:./helpers:./.vscode:./.git:./papers:./dev_scripts_helpers:./.mypy_cache:./config_root:./docs:./import_check:./linters::/app:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# set_up_docker_git()
git --version: git version 2.43.0
/app
# set_pythonpath()
Adding /app to PYTHONPATH
PYTHONPATH=/app:
# Configure env
# set_up_docker_in_docker()
{ "storage-driver": "vfs" }
/etc/init.d/docker: 69: ulimit: error setting limit (Operation not permitted)

It also gives the same error when I do:

if as_user:                      
        docker_cmd_.append(
            r"""                                                    
        --user 1001:1001"""
        )

aangelo9 avatar Apr 25 '25 01:04 aangelo9

Hmm that's weird but at least it doesn't ask for pwd this time. You can try to delete the image on your local and rerun maybe.

Also it look likes our colleague had similar issue and added a fix here https://github.com/causify-ai/helpers/blob/5ff3dc086faa689a2635494f906a7f86e1f100e4/dev_scripts_helpers/thin_client/thin_client_utils.sh#L334-L338

/etc/init.d/docker: 69: ulimit: error setting limit (Operation not permitted)

You can bash into the container also and find what that line is. See if you can change the ulimit and start the docker manually from inside the container.

docker run -it --user 1001:1001 --entrypoint bash causify/helpers:dev

Feel free to do some debugging on your setup (in case there's edge case that we don't know). I did a quick search and this ulimit error is common in DinD setup.

heanhsok avatar Apr 25 '25 14:04 heanhsok

My fix was to comment out all ulimit including the if block with it in etc/init.d/docker. https://github.com/causify-ai/helpers/blob/5ff3dc086faa689a2635494f906a7f86e1f100e4/dev_scripts_helpers/thin_client/thin_client_utils.sh#L334-L338

# Comments out ulimit -Hn 524288.
sudo sed -i 's/ulimit -Hn/# ulimit -Hn/g' /etc/init.d/docker

# Comments out `if` block.
sudo sed -i '/if \[ "\$BASH" \]; then/,/fi/ s/^/#/' /etc/init.d/docker

		# Only set the hard limit (soft limit should remain as the system default of 1024):
		# ulimit -Hn 524288

		# Having non-zero limits causes performance problems due to accounting overhead
		# in the kernel. We recommend using cgroups to do container-local accounting.
#		if [ "$BASH" ]; then
#			ulimit -u unlimited
#		else
#			ulimit -p unlimited
#		fi

I was able to get inside the container with DinD working. However, ran into a read-write permission error when running invoke lint.

PermissionError: [Errno 13] Permission denied: 'tmp.amp_normalize_import.txt'

This is probably due to --user 1001:1001 having no permissions on host and would be fixed when the original ubuntu user is used.

aangelo9 avatar Apr 25 '25 16:04 aangelo9

  1. @aangelo9 If you created files with the "wrong" user, then only your root can delete them

  2. Adding the user to etc_sudoers is the right approach

Linux users.

ubuntu ALL=(ALL) NOPASSWD:ALL user_1000 ALL=(ALL) NOPASSWD:ALL

  1. @heanhsok any "easy" way to repro this on one of our systems? On one side, having a way to reproduce it ourselves and fix it is best (at least on your laptop if you can see the error). Just thinking around these nightmare of debugging on other systems. Unless you think this was one-and-done

gpsaggese avatar Apr 28 '25 20:04 gpsaggese

any "easy" way to repro this on one of our systems?

I don't think we can have this exact setup on our dev server because the VM is running from Window host.

Although, in theory, I think we should try to make it that things work the same way whether it's Linux running on VM with Window host or Mac host, Linux on CI, or Linux on dev server.

  • I was able to reproduce the first error (docker not started) by running a Linux VM on Mac and fixed it by adding the user to the etc_sudoers
  • I am still unable to reproduce the second issue (ulimit error) though

Comments out if block. sudo sed -i '/if [ "$BASH" ]; then/,/fi/ s/^/#/' /etc/init.d/docker

@aangelo9 The concern I have with this fix is that we're not sure if it will cause side effects to other parts of our dev systems as we're modifying the Docker service management script. It would be good if we could find a reference to an online thread discussing this issue and the fixes similar to "TODO(Vlad): Fix ulimit error: https://github.com/docker/cli/issues/4807."

Also if would be good if we can test this on another machine with the same setup (another interns' machine maybe?) so we can be sure that it's not machine specific issue (e.g. VM software version, Linux version, machine type,...etc)? WDYT? @sonniki

heanhsok avatar Apr 28 '25 22:04 heanhsok