porter icon indicating copy to clipboard operation
porter copied to clipboard

Porter publish rebuild the image instead of just pushing it

Open LvffY opened this issue 2 years ago • 2 comments

Describe the bug

My porter bundle is correctly building. But for the build to pass correctly, I need to pass some --build-arg (in particular a GIT_TOKEN).

During my buid everything runs fine. Then during the publish, for no reason, the porter publish command tries to rebuid my image. Because I can't have --build-arg at this step, my build fails.

I'm thinking of a bug here, because just set some retries on my task make it pass and runs correctly (i.e just push the image).

To Reproduce

Steps to reproduce the behavior:

  1. Run
curl -L https://cdn.porter.sh/latest/install-linux.sh | bash
export PATH=~/.porter
porter mixins install --url https://github.com/LvffY/ansible-mixin/releases/download ansible
porter mixins install --feed-url https://mchorfa.github.io/porter-helm3/atom.xml helm3
porter mixins install kubernetes
porter build --build-arg GIT_TOKEN="This is a secret for build time !" --build-arg BUNDLE_USER=porter
  1. Use this porter.yaml & Dockerfile.tmpl
Dockerfile.tmpl

FROM ubuntu:20.04

## Porter init steps
## See https://porter.sh/bundle/custom-dockerfile/#porter_init
# PORTER_INIT

## Git token to use to clone
ARG GIT_TOKEN
## Mozilla SOPS version
ARG SOPS_VERSION=3.7.3

## Git script to generate password non interactively
## See https://git-scm.com/docs/gitcredentials#_requesting_credentials
ENV GIT_ASKPASS=$BUNDLE_DIR/git_askpass.sh \
    ## Ensure path contains python executable for the porter user
    PATH=$PATH:/home/${BUNDLE_USER}/.local/bin \
    ## Avoid noisy depreciation warnings in python libraries not under our responsabilities
    PYTHONWARNINGS="ignore" \
    ## Ensure Ansible will find installed collections
    ANSIBLE_COLLECTIONS_PATH="$BUNDLE_DIR/collections"

RUN apt-get update && \
    ## Install all necessary tools
    apt-get install -y --no-install-recommends wget gnupg2 python3 python3-dev python3-pip python3-venv git libpq-dev build-essential apt-transport-https ca-certificates &&  \
    ## Install SOPS
    wget https://github.com/mozilla/sops/releases/download/v$SOPS_VERSION/sops_${SOPS_VERSION}_amd64.deb && \
    dpkg -i sops_${SOPS_VERSION}_amd64.deb && \
    ## Ensure linux packages are up-to-date \
    apt-get dist-upgrade -y && \
    ## Because helm mixins install the executable under /usr/local/bin/helm3, we create a symbolic link to avoid too many configurations in our ansible playbooks.
    ## During the installation process, this link is broken.
    ## When helm mixins is installed this link will work.
    ln -sf /usr/local/bin/helm3 /usr/local/bin/helm && \
    ## Clean up image
    apt-get autoremove &&  \
    apt-get autoclean &&  \
    rm -rf /var/lib/apt/lists/* sops_${SOPS_VERSION}_amd64.deb

# Use the BUNDLE_DIR build argument to copy files into the bundle
COPY . $BUNDLE_DIR

## Clone playbook repositories
RUN echo "${GIT_TOKEN:?GIT_TOKEN is not set, please set it.}"

## Porter mixins install
## See https://porter.sh/bundle/custom-dockerfile/#porter_mixins
# PORTER_MIXINS

porter.yaml

name: porter/reproduce-porter-error
version: 0.0.9
description: "Bundle to deploy our demo apps"
registry: placeholder
dockerfile: Dockerfile.tmpl
schemaVersion: 1.0.0-alpha.1

mixins:
  - exec
  - ansible:
      otherPipDependencies:
        - virtualenv
        - psycopg2
        - openshift
  ## No explicit use of these mixins but we use this config to don't have to know how to install kubectl & helm
  - kubernetes
  - helm3:
      clientVersion: v3.10.0

parameters:
  - name: git-token
    description: Git token to use to clone ArgoCD repository
    type: string
    env: GIT_TOKEN
    sensitive: true

install:
  - exec:
      description: "Description of the command"
      command: echo # The command to run, must be on the PATH
      arguments: # arguments to pass to the command
        - "Hello install !"

upgrade:
  - exec:
      description: "Description of the command"
      command: echo # The command to run, must be on the PATH
      arguments: # arguments to pass to the command
        - "Hello upgrade !"


uninstall:
  - exec:
      description: "Description of the command"
      command: echo # The command to run, must be on the PATH
      arguments: # arguments to pass to the command
        - "Hello uninstall !"
  1. Run this porter command ...
porter publish --registry="my_acr.azurecr.io" --force ## Yes I'm using an ACR, may be the error comes from here ?
  1. See error

Expected behavior

The porter publish shoud just push my image and never rebuild it.

Porter Command and Output

  1. porter build command (went well)

porter-build.txt

  1. porter publish (went wrong then push my previously built image)

porter-publish.txt

Version

Copy the output of porter version below

porter v1.0.0 (89ad7ed6)

Other hypothesis

My image is kind of big (2GB+). May be the issue could some kind of caches that takes a long time to update, so the publish tries to rebuild an unfound image (in its cache) then finally found it after some retries ?

Azure pipelines

I don't think this a workstation issue, because I'm runnning my builds in Azure pipeline in some Microsoft hosted agents (i.e kind of very general VMs and that could be considered clean every time).

For information, here are the relevant steps on my pipeline :

azure-pipelines.yaml

stages:
  - stage: snapshot
    displayName: Build snapshot porter app
    pool:
      vmImage: ubuntu-latest
    jobs:
      - job: snapshot
        displayName: Build snapshot porter app
        steps:
          - checkout: self

          - task: Docker@2
            displayName: Login to ACR
            inputs:
              command: login
              containerRegistry: my_acr

          - task: Bash@3
            displayName: Install porter
            inputs:
              targetType: inline
              script: |
                set -e
                curl -L https://cdn.porter.sh/latest/install-linux.sh | bash
                echo "##vso[task.setvariable variable=PATH]${PATH}:~/.porter"

          - task: Bash@3
            displayName: Porter build
            inputs:
              targetType: inline
              script: |
                set -e
                porter mixins install --url https://github.com/LvffY/ansible-mixin/releases/download ansible
                porter mixins install --feed-url https://mchorfa.github.io/porter-helm3/atom.xml helm3
                porter mixins install kubernetes
                porter build --build-arg GIT_TOKEN=$GIT_TOKEN --build-arg BUNDLE_USER=porter
            env:
              GIT_TOKEN: "This is a secret for build time !"

          - task: Bash@3
            displayName: Porter publish
            retryCountOnTaskFailure: 10
            inputs:
              targetType: inline
              script: |
                set -e
                porter publish --registry="my_acr.azurecr.io" --force

LvffY avatar Oct 07 '22 06:10 LvffY

Porter detects if the bundle definition has changed since the last build and we do a JIT build before publish. Seems like we have at least one bug around --build-arg that is causing it to always detect that it's out of date, but I suspect that there are more ways to trigger a build when we don't truly need it.

Unpredictably rebuilding (incorrectly to boot because it's missing the build args), isn't acceptable, sorry about the bug...

Couple thoughts:

  1. We could introduce a setting to turn off the rebuild check. But honestly if we have such little faith that it's correct, I'd rather remove the behavior (or make it opt-in) though I'm unsure about how to go about changing behavior after 1.0 without causing further problems.
  2. If we keep the behavior, the publish command needs to expose all the same flags as build which... just makes me want to rip out the feature even more. 😂
  3. Rebuild isn't terribly smart, as it only detects a small subset of changes that indicate the build is stale. We have an open request to support detecting changes to any files included in the docker context (#1818).

@squillace @vdice Do you have any thoughts on this?

carolynvs avatar Oct 07 '22 14:10 carolynvs

Unpredictably rebuilding (incorrectly to boot because it's missing the build args), isn't acceptable, sorry about the bug...

@carolynvs no problem. I just had to search quite a long time to isolate the bug and found that a simple retry make it pass x).

TBH I'm more fan of publish to expose the same as build. In my mind, in my pipelines at least, I would just have to run porter publish to build & publish my bundle.

LvffY avatar Oct 07 '22 14:10 LvffY

Here's what I think would work to make sure autobuild doesn't have false positives (rebuilds when nothing has changed) and doesn't ever helpfully rebuild with the wrong settings:

Opt-out

See #2531 for the tracking issue Add a configuration setting that disables the autobuild entirely, letting people opt out when it isn't the desired behavior. The setting must default to false and be worded such that the grammar works out. So for example, we can't do --autobuild and have it default to true, but something like --disable-autobuild would be fine. The setting will be a flag on porter publish and other commands, and also available as an environment variable and configuration setting.

Remove false positive

Fix the autobuild check to more accurately detect when the bundle has changed so that it doesn't trigger a build when nothing is different (#2503). Right now something is causing porter to think a rebuild is necessary and the users disagree. We don't have to perfect rebuild in all cases, but we shouldn't rebuild when nothing is different.

Mimic Docker better

This is not strictly needed but would be a welcome improvement to autobuild.

Improve the autobuild check to be consistent with docker build, i.e. take into account all the files in the docker context that could have changed (and trigger an autobuild) instead of only porter.yaml. That way modifying helpers.sh or a terraform module would also be detected and a build triggered.

Right now we store a hash of porter.yaml in the generated bundle.json and use that to tell when an autobuild is required We could store additional information about the build context (e.g. last build time, hash of files, were custom build flags used) in the bundle directory and exclude them from the docker context so that we can use that to determine if we should build without adding unnecessary info to the bundle.

Disable autobuild when custom build flags are used

When the user built a bundle with custom flags, do not autobuild from other commands such as explain, install, inspect, etc. Instead porter should halt and instruct the user to repeat the build command with the desired custom flags specified.

Alternatives that were considered:

  • Remember the flags used on build and re-use them when doing an autobuild. I am concerned that because of the sensitive nature of the build flags that can be passed (e.g. docker secrets, ssh connections, etc) that this could be a security problem at worst down the road, and at best be too much magic that the user may not have expected. I'd also like to move autobuild towards being more stable and less "tiny trickle of bugs coming in around the feature" and I see automatically reusing the more complex docker build flags as something that will lead to more bugs down the road.
  • Add all of the docker build flags to all other commands so that autobuild can use them. This is a non-starter for me. The breadth of commands impacted is pretty large and these commands have nothing to do with build (e.g. porter explain). I don't want to litter the other commands with unrelated flags or encourage people to rely on autobuild in ways it wasn't intended for, such as not having to call build before publish in a CI pipeline. The extra call to build isn't such a burden that the cost of dev time and usability is worth it IMO.

I'll let this suggestion sit for a bit to gather feedback and then split these up into separate issues.

carolynvs avatar Jan 20 '23 15:01 carolynvs

I agree with everything. Actually, just remove false positive would be enough to me, but the mimic docker better is a great idea.

LvffY avatar Jan 20 '23 17:01 LvffY

I think this is a good plan. I imagine that we'll turn off autobuild for the near future.

tamirkamara avatar Jan 23 '23 13:01 tamirkamara

I have created a new issue that hopefully explains the opt-out config flag in enough detail that if anyone is interested in contributing, it should be ready for someone to implement. If not, please feel free to ask questions on the issue or open a draft PR and ask there.

https://github.com/getporter/porter/issues/2531

carolynvs avatar Jan 23 '23 20:01 carolynvs

Hey! It looks like this issue was fixed by #2733 - I'm going to close this issue for now. Thank you for your contributions!

schristoff avatar May 08 '23 18:05 schristoff