terraform-provider-google icon indicating copy to clipboard operation
terraform-provider-google copied to clipboard

Separate resource for Composer extra Python dependencies

Open akerone opened this issue 2 years ago β€’ 9 comments

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

TL;DR: The installation of extra pypi packages on a Composer environment should be a separate resource.

When creating a Composer environment using Terraform, one of the worst case scenarios is actually one of the most common: Create an environment, wait for 30 to 60 minutes for it to spin up, and watch it fail because there's a conflict in Python dependencies. Rinse and repeat until you find the perfect combination that doesn't break for this specific image of composer.

If that happens, the environment is actually created in GCP but tainted in terraform, so the next apply has to first destroy the environment and then create it again. It would be much nicer to just try and update the python dependencies on that already created environment.

Additionally, this new resource would be in charge of doing the necessary configuration to install packages from outside pypi, like a specific repository or an artifact registry.

New or Affected Resource(s)

  • google_composer_environment
  • google_composer_extra_packages (?)

Potential Terraform Configuration

resource "google_composer_extra_packages" "python_dependencies" {
  extra_packages = {
    apache-airflow-providers-sftp = "==4.1.0"

    prettytable                    = "==3.4.1"
    apache-airflow-providers-slack = "==6.0.0"

    # Required for S3 connectivity
    apache-airflow-providers-amazon = "==6.0.0"
    boto3                           = "==1.24.89"
  }

  # URLs taken from google's documentation https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies
  extra_index_urls = [
    "https://192.0.2.10/",
    "https://us-central1-python.pkg.dev/example-project/example-repository/simple/",
    "https://example.com/"
  ]
}

References

akerone avatar Oct 13 '22 11:10 akerone

This could make sense to add, but we're not sure. Adding to Backlog.

rileykarson avatar Oct 24 '22 17:10 rileykarson

Faced this issue so many times.. it is definitely a good idea

YounesCoder avatar Oct 24 '22 21:10 YounesCoder

By the way, I tried (and failed) to recreate this feature using a local-exec provisioner. It works the first time, but any subsequent applies uninstall them again since they're not in the software_config section.

resource "local_file" "requirements" {
  filename = "/tmp/requirements.txt"
  content = join("\n", [for k, v in var.extra_pypi_packages : "${k}${v}"])
}

resource "null_resource" "install_pypi_packages" {
  triggers = {
    packages_map = "${md5(jsonencode(var.extra_pypi_packages))}#${md5(google_composer_environment.composer_env.config.0.airflow_uri)}"
  }

  provisioner "local-exec" {
    interpreter = ["bash", "-c"]
    command = <<-EOT
      gcloud composer environments update ${google_composer_environment.composer_env.name} \
      --location ${google_composer_environment.composer_env.region} \
      --update-pypi-packages-from-file ${local_file.requirements.filename}
    EOT
  }
}

akerone avatar Jan 12 '23 10:01 akerone

This is incredibly useful and a must-have IMO. Most (if not all) of our Composer BAU updates done via terraform fail due to this. If creating a separate terraform resource involves significant effort and therefore cannot be done quickly, I would like to propose the following solution that should work in the short-term and greatly reduce the pain associated with this issue: during the terraform apply phase, if everything except installing/updating the pypi packages completes successfully, then the resource should not be marked as tainted. The actual Composer environment in GCP is anyway fully instantiated and ready to be used (minus the pypi packages, which are not a core component), so it is only fair that terraform reflects that.

This would greatly reduce the iteration time, as in the subsequent code changes users need only focus on updating the pypi packages. Right now, because the Composer resource is marked as tainted, any and all changes (even minor ones) force the recreation (destroy/create) of the Composer environment, which takes really long.

As a interim workaround, we are currently creating our Composer environments in two steps - in the first step we don't have pypi packages in the code. When this completes successfully, we add the pypi packages. In this way, the resource is not marked as tainted and therefore only pypi package changes are picked up. It would be good to have terraform do this automatically so we don't have do this manually for each Composer environment (we have many environments, which exacerbates the pain). Thoughts?

mkumarpcln avatar Sep 01 '23 15:09 mkumarpcln

This is incredibly useful and a must-have IMO. Most (if not all) of our Composer BAU updates done via terraform fail due to this. If creating a separate terraform resource involves significant effort and therefore cannot be done quickly, I would like to propose the following solution that should work in the short-term and greatly reduce the pain associated with this issue: during the terraform apply phase, if everything except installing/updating the pypi packages completes successfully, then the resource should not be marked as tainted. The actual Composer environment in GCP is anyway fully instantiated and ready to be used (minus the pypi packages, which are not a core component), so it is only fair that terraform reflects that.

This would greatly reduce the iteration time, as in the subsequent code changes users need only focus on updating the pypi packages. Right now, because the Composer resource is marked as tainted, any and all changes (even minor ones) force the recreation (destroy/create) of the Composer environment, which takes really long.

As a interim workaround, we are currently creating our Composer environments in two steps - in the first step we don't have pypi packages in the code. When this completes successfully, we add the pypi packages. In this way, the resource is not marked as tainted and therefore only pypi package changes are picked up. It would be good to have terraform do this automatically so we don't have do this manually for each Composer environment (we have many environments, which exacerbates the pain). Thoughts?

This sounds like a good idea but yes it's annoying and not ideal. I'm struggling to get this working.

I'm not sure this is a new change to the API but when reapplying the terraform configuration after adding the pip.conf with the correct internal package repo to the bucket, the google api returns an error saying it can't destroy due to it not finding the resource URL. Anyone had this issue? Seems that it adds the v1beta1/ to the url and it can't find it. This was working before and it has stopped recently.

Error: googleapi: got HTTP response code 404 with body: <!DOCTYPE html>
β”‚ <html lang=en>
β”‚   <meta charset=utf-8>
β”‚   <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
β”‚   <title>Error 404 (Not Found)!!1</title>
β”‚   <style>
β”‚     *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
β”‚   </style>
β”‚   <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
β”‚   <p><b>404.</b> <ins>That’s an error.</ins>
β”‚   <p>The requested URL <code>/v1beta1/projects/project_id_here/locations/europe-west2/environments/?alt=json&amp;prettyPrint=false</code> was not found on this server.  <ins>That’s all we know.</ins>

If this was solved as per suggestion above this wouldn't even be an issue.

guicane avatar Nov 15 '23 15:11 guicane

This ticket is now solved by: https://github.com/hashicorp/terraform-provider-google/issues/14586

You can now provide a storage config to your Composer environment where you specify an existing GCS bucket. In this bucket you can save a pip.conf, configuring access to your private PyPi. As long as the Service account building the composer environment is authorized to access the private PyPi, it will successfully install private python packages when it is created. Examples provided in the linked issue

CaliWee avatar Dec 19 '23 22:12 CaliWee

Hi again! I'm sorry for the late response.

I don't think this issue is actually solved. The main discussion here is to have the composer environment not be marked as tainted by pip dependency issues, not how to integrate private repositories.

A separate resource for the extra packages is still the best solution in my eyes, as gcloud itself creates the environment in those 2 steps even though they packaged it in a single command.

akerone avatar Feb 21 '24 17:02 akerone

Apologies, I misread your original question. I agree, the linked issue does not solve your problem

have you considered managing packages at the Task level with PythonVirtualEnv operators?

Not a solution, but perhaps a workaround. Alternatively theres also docker + kubernetes pod operators but that is a bit more overhead

CaliWee avatar Feb 21 '24 17:02 CaliWee

have you considered managing packages at the Task level with PythonVirtualEnv operators?

Not a solution, but perhaps a workaround. Alternatively theres also docker + kubernetes pod operators but that is a bit more overhead

Those are indeed the two "workarounds" that we're using. Actually, Composer is horrible with dependencies due to the immense amount of packages included with each version to the point where it is almost impossible to install the latest version of anything else. I think any big deployment will have to use PythonVirtualEnv and/or KubernetesPod at some point.

Despite that, there may still be the need for smaller tasks that require extra packages, and not even latest versions of those. That's where I think that the extra PyPi packages play a role as the simplest possible solution. For them to not become a nightmare for platform administrators though, the current environment creation mechanism needs to change.

As @mkumarpcln said, the best current workaround is to first create the environment and then add the packages once the environment is live, which is what I'd do if I was creating it through the Google Cloud Console. In terraform, it becomes a major pain and I think it can be "easily" improved.

I apologize if in my communications I have led to the impression that this is a blocking issue. I'm instead proposing an enhancement that would better reflect in terraform how Google Cloud itself creates composer environments.

Cheers!

akerone avatar Feb 22 '24 11:02 akerone