sedona icon indicating copy to clipboard operation
sedona copied to clipboard

We can run pre-commit hooks with Docker. Two Scenarios to choose from ?

Open jbampton opened this issue 5 months ago • 0 comments

A quick example which I created for mruby here:

https://github.com/mruby/mruby/blob/8e074f185bd940f7dd52e3296f3c9bf70c830ba4/Dockerfile#L11

Apache Airflow uses Docker and pre-commit

Airflow has 1700 lines of code in their pre-commit config:

https://github.com/apache/airflow/blob/main/.pre-commit-config.yaml

And now what Google Gemini says about it. Gemini can make mistakes, so double-check it.

Installing pre-commit hooks inside a Docker container usually means one of two things:

  1. You want pre-commit to run as part of your CI/CD pipeline within a Docker image. In this scenario, the Docker container is the environment where the hooks will execute to validate your code.
  2. You want to set up your development environment to run pre-commit hooks locally, but your development setup itself is Dockerized (e.g., using Docker Compose or a devcontainer). This is a bit more nuanced, as pre-commit typically interacts with your local Git repository.

Let's break down both scenarios:

Scenario 1: Installing Pre-Commit Hooks for CI/CD within a Docker Container

This is the more common and straightforward approach. You include pre-commit and your configuration in your Dockerfile so that your automated checks can run reliably.

Steps:

  1. Create your .pre-commit-config.yaml: Make sure you have a .pre-commit-config.yaml file at the root of your project. This file defines which hooks to run.

    # .pre-commit-config.yaml
    repos:
      - repo: https://github.com/pre-commit/pre-commit-hooks
        rev: v4.6.0 # Use the latest stable version
        hooks:
          - id: trailing-whitespace
          - id: end-of-file-fixer
          - id: check-yaml
          - id: check-added-large-files
      - repo: https://github.com/psf/black
        rev: 24.4.2 # Use the latest stable version
        hooks:
          - id: black
      # Add other hooks as needed (e.g., for Python, JavaScript, etc.)
    
  2. Add pre-commit installation to your Dockerfile: You'll need to install pre-commit and then "install" the hook environments. The pre-commit install-hooks command will download and set up the necessary tools for the hooks defined in your .pre-commit-config.yaml.

    # Dockerfile
    
    # Use an appropriate base image for your project
    FROM python:3.9-slim-buster
    
    # Set working directory
    WORKDIR /app
    
    # Copy your project files, including .pre-commit-config.yaml
    COPY . /app/
    
    # Install pre-commit
    RUN pip install pre-commit
    
    # Initialize a dummy Git repository and install hook environments
    # This is crucial for pre-commit to cache the hook executables
    # The `rm -rf .git` ensures this temporary repo doesn't affect your main repo.
    RUN git init . && pre-commit install-hooks && rm -rf .git
    
    # You can now run pre-commit against your files
    # For CI, you'd typically run it as part of a build script or directly in the Dockerfile
    # Example for a CI step:
    CMD ["/usr/local/bin/pre-commit", "run", "--all-files"]
    
    # If your app needs to run, add your usual entrypoint/cmd here
    # ENTRYPOINT ["python", "your_app.py"]
    

    Explanation of RUN git init . && pre-commit install-hooks && rm -rf .git:

    • git init .: pre-commit needs a Git repository to set up its hooks and cache environments. We create a temporary one.
    • pre-commit install-hooks: This command processes your .pre-commit-config.yaml and downloads/installs the tools required by your hooks into ~/.cache/pre-commit within the container. This makes your Docker image self-contained with all the necessary hook dependencies.
    • rm -rf .git: After pre-commit has set up its environments, the temporary .git directory is no longer needed for the image itself, so we remove it to keep the image clean.
  3. Run pre-commit in your CI/CD pipeline: In your CI/CD configuration (e.g., GitHub Actions, GitLab CI, Jenkins), you would build this Docker image and then execute pre-commit run --all-files within a container spun up from this image.

    Example (conceptual CI step):

    docker build -t my-project-linted .
    docker run my-project-linted pre-commit run --all-files
    

    If pre-commit run --all-files exits with a non-zero status (meaning hooks failed), your CI pipeline will fail, indicating a code quality issue.

Scenario 2: Running Pre-Commit Hooks in a Dockerized Local Development Environment

This is more complex because pre-commit usually needs to interact directly with your host machine's Git repository. There are a few approaches, each with pros and cons:

Option A: Install pre-commit on the Host (Recommended for Local Dev)

This is generally the simplest and most robust way to use pre-commit for local development, even if your application runs in Docker.

  1. Install pre-commit on your host machine:
    pip install pre-commit
    # Or using your system's package manager, e.g., brew install pre-commit on macOS
    
  2. Navigate to your project root and run:
    pre-commit install
    
    This creates the necessary Git hooks in your .git/hooks directory.
  3. Ensure hooks have access to necessary tools: If your pre-commit hooks rely on tools that are only installed inside your Docker container (e.g., a specific Python version, a linter, or formatter), you have a few choices:
    • Install those tools on your host machine as well. (Simplest for common tools).
    • Use language: docker_image hooks in your .pre-commit-config.yaml: This allows pre-commit to run a specific hook inside a Docker image. This can be complex to set up, but ensures consistency. You'd need to define the Docker image and entry point for that hook.
    • Manually modify your pre-commit hook script: You could theoretically modify the .git/hooks/pre-commit script to execute a Docker command that runs the actual linter/formatter inside your development container. This is generally discouraged as it deviates from pre-commit's standard usage and can be brittle.

Why this is often preferred:

  • pre-commit is designed to run locally on your Git repository.
  • It provides immediate feedback before you even commit.
  • Avoids complexities of Docker-in-Docker or mounting Git directories.

Option B: Running pre-commit within a Development Container (e.g., VS Code Dev Containers)

If your entire development environment, including Git operations, happens inside a Docker container (like with VS Code Dev Containers), then installing pre-commit inside that container makes sense.

  1. Add pre-commit installation to your Dev Container's Dockerfile or devcontainer.json: You'd follow steps similar to the CI/CD scenario, making sure pre-commit and its hook environments are set up when the dev container builds.

    Example in a devcontainer.json for VS Code:

    {
      "name": "My Project",
      "dockerFile": "Dockerfile",
      "mounts": [ "source=${localWorkspaceFolder},target=/workspace,type=bind" ],
      "postCreateCommand": "pre-commit install", // This runs after the container is created
      "postStartCommand": "pre-commit install --install-hooks" // Can also run on start if needed
    }
    

    And in your Dockerfile for the dev container:

    FROM python:3.9-slim-buster
    WORKDIR /workspace
    
    COPY . /workspace/
    RUN pip install pre-commit
    # No need for `git init` and `rm -rf .git` here if the .git directory is mounted from host
    

    The key is that the .git directory from your host machine is mounted into the container, allowing pre-commit to set up the hooks directly within that mounted Git repository.

Option C: Running pre-commit via docker run or docker-compose exec

This is less about "installing" hooks inside the container, and more about running the pre-commit command using your container's environment. You would typically do this manually or integrate it into a wrapper script.

  1. Ensure pre-commit is installed in your Docker image (as in Scenario 1).
  2. Manually execute the checks:
    docker-compose run --rm my_service pre-commit run --all-files
    
    Or if you just have a Dockerfile:
    docker build -t my_linter_image . # Build the image with pre-commit installed
    docker run -v $(pwd):/app my_linter_image pre-commit run --all-files
    
    This is useful for ad-hoc checks or for a custom script that integrates with your local git commit process, but it's not the typical way pre-commit install works.

Key Considerations:

  • Caching: pre-commit caches its hook environments. When building a Docker image for CI/CD, running pre-commit install-hooks during the build process pre-populates this cache, making subsequent pre-commit run commands faster and independent of external network access at runtime.
  • Git Repository: pre-commit needs access to a Git repository. For CI/CD, you typically copy your code into the container, and then git init a temporary repo. For local dev, you usually mount your host's Git repo.
  • Performance: For local development, running pre-commit directly on your host is often faster than spinning up Docker containers for each check.
  • Consistency: Using pre-commit ensures that all developers and CI/CD pipelines use the exact same code quality checks, regardless of their local setup.

Choose the method that best fits your workflow and the needs of your team. For CI/CD, embedding pre-commit in your Dockerfile as shown in Scenario 1 is highly recommended. For local development, installing pre-commit on the host machine and letting it manage hooks against your local .git repository (Option A) is generally the most straightforward.

jbampton avatar Jul 31 '25 02:07 jbampton