atlantis icon indicating copy to clipboard operation
atlantis copied to clipboard

Allow only a specific list of providers (allowlist)

Open minamijoyo opened this issue 3 years ago • 12 comments

Hi there,

As already described in the documentation, protecting terraform apply from malicious code is not enough. It's possible to inject arbitrary code at plan phase using the external data source or a malicious provider, so we also need a security check mechanism before plan.

https://github.com/runatlantis/atlantis/blob/v0.17.5/runatlantis.io/docs/security.md#protect-terraform-planning

Protect Terraform Planning If attackers submitting pull requests with malicious Terraform code is in your threat model then you must be aware that terraform apply approvals are not enough. It is possible to run malicious code in a terraform plan using the external data source or by specifying a malicious provider. This code could then exfiltrate your credentials. To prevent this, you could:

  1. Bake providers into the Atlantis image or host and deny egress in production.
  2. Implement the provider registry protocol internally and deny public egress, that way you control who has write access to the registry.
  3. Modify your server-side repo configuration's plan step to validate against the use of disallowed providers or data sources or PRs from not allowed users. You could also add in extra validation at this point, e.g. requiring a "thumbs-up" on the PR before allowing the plan to continue. Conftest could be of use here.

I understand that it's a common problem not only for Atlantis, but also other Terraform CI/CD solutions. I'd like to discuss how to implement protections and mitigate the risk in Atlantis. If the Atlantis community recommend the above, it would be great if we have some built-in features or instructions how to implement it.

I feel the option 1 and 2 are heavy operational burden for me who want to update provider frequently. I guess we could automate the process, but it requires users to set up a lot of components and it's not a trivial work.

My current attempt is a part of 3. I'm using a custom workflow to add the following script to check if selected providers are allowed:

repos.yaml


repos:
- id: /.*/

workflows:
  default:
    plan:
      steps:
        - init:
        - run: check_provider_allowlist.sh
        - plan:

check_provider_allowlist.sh

#!/bin/bash

set -euo pipefail

# An example for provider allow list
PROVIDER_ALLOWLIST=(
  'registry.terraform.io/hashicorp/aws'
  'registry.terraform.io/integrations/github'
)

is_allowed() {
  local selection=$1
  for allow in "${PROVIDER_ALLOWLIST[@]}"
  do
    if [[ "$allow" == "$selection" ]]; then
      # found
      return 0
    fi
  done

  # not found
  return 1
}

# The terraform version command returns not only terraform core version,
# but also selected provider's versions.
mapfile -t PROVIDER_SELECTIONS < <(terraform version -json | jq -r '.provider_selections | keys | .[]')

for selection in "${PROVIDER_SELECTIONS[@]}"
do
  if ! is_allowed "$selection" ; then
    exit 1
  fi
done

This simple script works fine for me with the current latest Atlantis v0.17.5 + Terraform v1.0.10.

You might think that it would be nice to check with a generic mechanism like conftest, but as far as I know the current policy check feature in atlantis with conftest runs after plan. It's not enough for protection on plan phase. In addition, a simple static analysis of the root module is not enough because provider dependencies can be also injected indirectly from module dependencies. This means that we need to fetch modules and check them recursively. I guess it's not easy but probably possible. To avoid reimplementing everything for the terraform initialization and configuration parser, I'm simply parsing the result of terraform version command in the script.

This may not prevent all attacks, but I think it's a reasonable solution to reduce the security risk on plan phase for many users. If you like this idea, I'm happy to write a patch to add the check as a built-in feature.

Finally, Please let me know if there is a better way. Other suggestions are always welcome. Thanks!

minamijoyo avatar Nov 11 '21 00:11 minamijoyo

We are using a terragrunt based workflow, which adds a few more possibilities to execute code server-side even during terragrunt init. I have implemented pre-init and pre-plan conftest policies to mitigate these risks, e.g. deny blacklisted functions or blocks.

The pre-init checks are necessary to prevent specific before and after terragrunt hook blocks.

The pre-plan checks will check for whitelisted providers and external data sources. This step runs after init, so it will also catch blacklisted functions in all modules.

Extract of our custom plan workflow steps:

{
  "env": {
    "name": "REPO_BASE_DIR",
    "command": "echo \"${DIR%$REPO_REL_DIR}\""
  }
},
{
  "run": "silent find \"$REPO_BASE_DIR\" -name \"*.tf\" -or -name \"*.hcl\" -exec conftest test --policy ~/policy/pre-init --parser hcl2 \\{\\} \\+"
},
{
  "run": "silent terragrunt init --terragrunt-check -no-color"
},
# some more steps: hclfmt, fmt, validate, tflint, checkov, etc
{
  "run": "silent find \"$REPO_BASE_DIR\" -name \"*.tf\" -or -name \"*.hcl\" -exec conftest test --policy ~/policy/pre-plan --parser hcl2 \\{\\} \\+"
},

Note: The silent command only prints to stdout if the executed command exits non-zero.

My implementation for whitelisting providers (after init, pre-plan):

package rules.whitelisted_providers

whitelisted_provider_sources := {
  "hashicorp/aws",
  "integrations/github"
}

# check list of required providers
deny[msg] {
  provider_source := input.terraform.required_providers[provider_name].source
  not whitelisted_provider_sources[provider_source]
  msg := sprintf("provider `%v` from `%v` found", [provider_name, provider_source])
}

# Required provider definition is not mandatory.
# Hence, also checking resource and data source prefix against whitelist

not_whitelisted(resource_type) {
  count({provider |
    whitelisted_provider_sources[provider]
    provider_prefix = split(provider, "/")[1]
    startswith(resource_type, provider_prefix)
  }) == 0
}

deny[msg] {
  input.data[_][resource_type][resource_label]
  not_whitelisted(resource_type)
  msg := sprintf("data source `%v.%v` found", [resource_type, resource_label])
}

deny[msg] {
  input.resource[_][resource_type][resource_label]
  not_whitelisted(resource_type)
  msg := sprintf("resource `%v.%v` found", [resource_type, resource_label])
}

Comments are appreciated. Maybe the current policy_sets could be extended to support different sets of policies for specific stages in the workflow!? Remark: Obviously, the above workaround can not leverage the built-in policy approval.

mhennecke avatar Nov 14 '21 10:11 mhennecke

Hi @mhennecke, Thank you for your suggestion!

I'm not familiar with terragrunt, but does it work because the terragrunt has a module cache? At least with plain terraform, the .terraform/modules directory contains only caches of external modules and not local subdirectory ones. The local ones are just recorded as a reference in a manifest file .terraform/modules/modules.json. I think the .terraform/modules directory not enough to use for validating configurations with conftest for local module dependencies.

minamijoyo avatar Nov 18 '21 01:11 minamijoyo

Yes, I totally agree. However, you should be able to use above conftest logic in a plain terraform workflow. I think you would only need to add the relevant local module folders (from the modules.json?) that are being used to the find ... -exec ... so that conftest checks those as well.

terragrunt creates a cache with external as well as local modules which makes it a little simpler.

mhennecke avatar Nov 19 '21 23:11 mhennecke

Yes, I think we can parse .terraform/modules/modules.json, retrieve local module paths and validate them with conftest. The only caution is that contents of .terraform/ directory are implementation details, not public interfaces for Terraform CLI.

minamijoyo avatar Nov 25 '21 01:11 minamijoyo

@mhennecke am I correct in thinking that your approach can be bypassed by using JSON configs? As the .tf.json files just won't be found at all. (https://www.terraform.io/docs/language/syntax/json.html)

bobtfish avatar Dec 02 '21 13:12 bobtfish

@bobtfish did you find an answer to your question? I'm implementing Atlantis and I'm trying to address the vulnerabilities mentioned in the docs.

y2abba avatar Mar 22 '23 22:03 y2abba

today I saw this: https://github.com/Madh93/tpm maybe this could help to manage the providers outside of terraform

jamengual avatar May 03 '23 15:05 jamengual

You can also do it with rego policies using conftest

nitrocode avatar May 04 '23 01:05 nitrocode

We wanted to use conftest as a validation mechanism for our Atlantis and adapting the solution outlined by @minamijoyo and @mhennecke we were able to achieve the intended result.

Since we use multiple providers on our organization it was decided to take a blacklist approach to the solution that was implemented in 3 steps.

1 - Script to parse the result of terraform providers command, eliminate duplicates and loop a conftest validation through said providers.

#!/bin/bash

## Retrieve providers downloaded by terraform init
providers_output=$(terraform providers)

## Parse result to output only the provider names
provider_names=$(echo "$providers_output" | grep -o 'provider\[.*\]' | awk -F ']' '{print $1"]"}' | sed 's/provider\[//;s/\]//')

## Eliminate duplicate providers
unique_provider_names=$(echo "$provider_names" | awk '!seen[$0]++')

## Set policy directory
POLICY_DIR="/path/to/rego/policy"

## Loop through identified providers to run a conftest agains the blacklist-providers policy
violated=false
for provider in $unique_provider_names; do
    json="{\"providers\": \"$provider\"}"
    if ! conftest test -p "$POLICY_DIR" - <<< "$json" >/dev/null; then
        violated=true
        echo "$json" | conftest test -p "$POLICY_DIR" -
    fi
done

## Check if the any provider was identified in the loop and exit with according error
if [ "$violated" = true ]; then
    exit 1
else
    exit 0
fi

This script will exit with error if any provider is identified. The last loop allows me to get all the blacklisted providers that were identified in the PR comment and not break the script in the first error.

We opted to run the validation before the plan since some of our repositories have big plan times and it would exponentially increase the time needed to run the full operation.

Also by using terraform providers we are able to identify both configuration explicit and external modules implicit providers.

This script iteration doesn't take in account providers version but that could be somewhat easily achievable by modifying the parsing parameters to include the providers version. It was not something in our scope for now.

2 - REGO policy that contains all providers that I want to blacklist

package main

##Blacklisted providers

not_allowed_providers = {
    "registry.terraform.io/hashicorp/external"
}

blacklist_providers[provider]{
    provider := input.providers
    not_allowed_providers[provider]
}

deny[msg] {
    count(blacklist_providers) > 0
    msg := sprintf("Module %s is not authorized", [blacklist_providers[_]])
}

3 - Adapt the workflow to use a pre-step to the default workflow in the repos.yaml

 workflows:
      default:
        plan:
          steps:
          - init
          - run:
              command: bash path/to/script
              output: show
          - plan

In the end, if a blacklisted provider is identified, we get the following PR in the comment.

Screenshot 2023-09-26 110352

Hopefully if we get a feature where we can specify the time where our policies run we are able to adapt and use the policy created to achieve the same end result.

Any inputs are highly appreciated.

barrosoj avatar Sep 26 '23 10:09 barrosoj

I am currently working on enhancing the security of Terraform. I have a similar scenario that requires protection. We permit the use of providers such as Helm or Kubernetes, and these providers have the capability to execute commands (exec) to obtain the necessary credentials for connecting to Kubernetes. The concern is that this exec command can be easily modified in any pull request (PR), potentially allowing someone to insert custom code and leak sensitive information.

While I am considering implementing Conftest or a similar approach, as recommended in the documentation, it would be highly beneficial if Atlantis could support the approval of pre-plan policies.

In the case of @barrosoj 's example, although terraform plan fails, it seems there isn't a straightforward way to authorize this by using atlantis approve_policies.

Could we explore the possibility of implementing atlantis approve_policies in a pre-plan phase?

To fortify the pipeline, I am proposing the following steps:

  1. Run terraform init.
  2. Execute terraform plan -refresh-only -out=tfplan—running a refresh only, avoids executing the exec command that might leak data, or run malicious code in our servers
  3. Generate a JSON representation of the plan: terraform show -json tfplan >tfplan.json.
  4. Conduct policy checks using Conftest: conftest test -p /mydir tfplan.json.

This approach will trigger a Terraform pre-plan alert, preventing a full terraform plan to be executed, if any policy violations are detected. Integrating a pre-plan policy check into Atlantis would be a valuable addition to enhance security.

Edited: In the documentation:

Modify your server-side repo configuration's plan step to validate against the use of disallowed providers or data sources or PRs from not allowed users. You could also add in extra validation at this point, e.g. requiring a "thumbs-up" on the PR before allowing the plan to continue. Conftest could be of use here.

Any idea on how to achieve : requiring a "thumbs-up" on the PR before allowing the plan to continue. ?

Thanks

marcportabellaclotet-mt avatar Dec 16 '23 10:12 marcportabellaclotet-mt

Can this be done by configuring provider installation in .terraformrc? This definitely has some limitations though. I'm unclear about whether this would affect providers that have already been downloaded and cached. If cached plugins are not checked, then you'd either have to use a single provider list for all workflows or use a different plugin cache directory for each workflow.

blowfishpro avatar Feb 27 '24 19:02 blowfishpro

The provider_installation feature indeed functions as intended. I had experimented with it back in Terraform v1.6, and even if the providers are cached, initiating terraform init before the plan stage would result in errors, which aligns well with the typical workflow in Atlantis. So, this shouldn't pose an issue in Atlantis setups.

The only concern is that this feature is designed with usage in network-isolated environments in mind. There's a possibility that its behavior might change in future versions if it's used in ways divergent from its intended purpose. Keeping this in mind, we can achieve what we need with Terraform's native capabilities.

minamijoyo avatar Mar 14 '24 02:03 minamijoyo