terraform-provider-google
terraform-provider-google copied to clipboard
Support for flex template runtime environment options
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.
Description
The projects.locations.flexTemplates.launch API used by google_dataflow_flex_template takes a LaunchFlexTemplateParameter object as its payload. LaunchFlexTemplateParameter has an environment field that takes a FlexTemplateRuntimeEnvironment object, and this is where callers can specify standard dataflow job options such as the region/zone, the number of workers, the network/subnet etc etc.
Unfortunately, the google_dataflow_flex_template_job resource in the google-beta provider does not currently expose the environment field, so it is not possible to pass any of these options to the job. As a workaround they can be specified in the parameters section, but this requires that the param names be listed in the job's metadata.json file and any parameter not listed there cannot be passed.
New or Affected Resource(s)
- google_dataflow_flex_template_job
Potential Terraform Configuration
resource "google_dataflow_flex_template_job" "big_data_job" {
provider = google-beta
name = "dataflow-flextemplates-job"
container_spec_gcs_path = "gs://my-bucket/templates/template.json"
parameters = {
inputSubscription = "messages"
}
environment = {
num_workers = 8
machine_type = "n1-standard-8"
enable_streaming_engine = true
additional_user_labels = {
billing_component = "dataproc"
}
}
}
b/374162542
As a postscript here, I believe that this should actually be considered a bug. Consider the following resource definition:
resource "google_dataflow_flex_template_job" "big_data_job" {
provider = google-beta
name = "dataflow-flextemplates-job"
container_spec_gcs_path = "gs://my-bucket/templates/template.json"
parameters = {
inputSubscription = "messages"
labels = "billing_component=dataproc"
}
}
The flex template launcher will pass the labels param to the job, and duly apply those labels to it. But because labels are part of the job state, terraform will on its next run see the job as differing from the resource state, and try to update it: this in essence becomes a permanently tainted resource:
# google_dataflow_flex_template_job.big_data_job will be updated in-place
~ resource "google_dataflow_flex_template_job" "big_data_job" {
id = "2021-08-22_14_10_06-17678439902224270650"
~ labels = {
- "billing_component" = "dataproc" -> null
}
name = "dataflow-flextemplates-job"
# (7 unchanged attributes hidden)
}
@rileykarson is there any timeframe for getting this addressed that you can share? This effectively blocks us (and, I suspect, other people) from migrating to flex templates generally, since it is not possible to accurately track spend on flex jobs without using labels.
No timeline- I'm not certain that's entirely a bug either, just incorrectly specified. You could define the label in both places, right?
Hi,
any updates regarding this?
From my perspective without those options it's quite hard to use this resources from Terraform level (the default network is anti-pattern so in real projects needs to be defined including other parameters like subnetwork, service account, ip configuration, etc).
Right now I've checked and workaround suggested by @n-oden works but it requires on metadata.json file parameters like:
"parameters": [
{
"name": "network",
"label": "Network.",
"helpText": "empty."
},
{
"name": "subnetwork",
"label": "Subnetwork",
"helpText": "empty."
},
{
"name": "service_account_email",
"label": "Service account email.",
"helpText": "empty."
},
{
"name": "ip_configuration",
"label": "IP Configuration.",
"helpText": "empty."
}
]
And later on Terraform definition:
parameters = {
input_subscription = "projects/my-project/subscriptions/my-sub"
output_table = "my-project:my-dataset.my-table"
# HACK for GCP Terraform Setup as those parameters are not supported in job level
network = "my-network"
subnetwork = "my-subnetwork"
service_account_email = "my-service-account-fqn"
ip_configuration = "WORKER_IP_PRIVATE"
}
Based on the above it would be nice to have options similar like for google_dataflow_job - otherwise each flex template needs to be polluted with those parameters from my perspective.
So I would expect:
resource "google_dataflow_flex_template_job" "dataflow_demo_job" {
parameters = {
input_subscription = "projects/my-project/subscriptions/my-sub"
output_table = "my-project:my-dataset.my-table"
}
network = "my-network"
subnetwork = "my-subnetwork"
service_account_email = "my-service-account-fqn"
ip_configuration = "WORKER_IP_PRIVATE"
}
No timeline- I'm not certain that's entirely a bug either, just incorrectly specified. You could define the label in both places, right?
@n-oden - i just understood what @rileykarson means here.
the workaround for having (FlexTemplateRuntimeEnvironment](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.flexTemplates/launch#FlexTemplateRuntimeEnvironment) parameters is to define them in the parameters section
the workaround for labels is to both define them in the parameters section && use the deprecated labels param
locals {
my_labels = {foo="bar"}
}
resource "google_dataflow_flex_template_job" "my_pipeline" {
provider = google-beta
name = "my-pipeline"
...
# this param is 'deprecated', but negates the noop changes
labels=local.my_labels
parameters = {
labels=jsonencode(local.my_labels)
maxNumWorkers=5
zone="europe-west2-c"
serviceAccount="[email protected]"
stagingLocation="gs://${local.dataflow_bucket_name}/stage/"
tempLocation="gs://${local.dataflow_bucket_name}/temp/"
network="https://www.googleapis.com/compute/v1/projects/my-proj/global/networks/my-net"
subnetwork="https://www.googleapis.com/compute/v1/projects/my-proj/regions/europe-west2/subnetworks/my-subnet"
}
}
Not sure if this is the exact same issue but we are passing the service account in the parameters section in the same way as in the official Google module :
java_pipeline_options = {
serviceAccount = var.service_account_email
subnetwork = var.subnetwork_self_link
dataflowKmsKey = var.kms_key_name
tempLocation = var.temp_location
stagingLocation = var.staging_location
maxNumWorkers = var.max_workers
usePublicIps = var.use_public_ips
enableStreamingEngine = var.enable_streaming_engine
}
python_pipeline_options = {
service_account_email = var.service_account_email
subnetwork = var.subnetwork_self_link
dataflow_kms_key = var.kms_key_name
temp_location = var.temp_location
staging_location = var.staging_location
max_num_workers = var.max_workers
no_use_public_ips = !var.use_public_ips
enable_streaming_engine = var.enable_streaming_engine
}
pipeline_options = var.job_language == "JAVA" ? local.java_pipeline_options : local.python_pipeline_options
This worked fine until now where we are seeing this error :
serviceAccount: Runtime parameter serviceAccount should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.
Details:
[
{
"@type": "type.googleapis.com/google.dataflow.v1beta3.InvalidTemplateParameters",
"parameterViolations": [
{
"description": "Runtime parameter serviceAccount should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.",
"parameter": "serviceAccount"
}
]
}
]
, badRequest
Since TF does not allow us to pass parameters in the environment section as it is not exposed, how can it produce this error ?
Runtime parameter serviceAccount should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.
That issue is from https://github.com/hashicorp/terraform-provider-google/issues/14679
Using the aforementioned workaround for labels by @ben-marengo-msmg above, we get the following error using Google-beta provider v5.0.0:
Error: googleapi: Error 400: The template parameters are invalid. Details:
labels: Runtime parameter labels should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.
Specifying labels in only the environment field does not apply them to the dataflow flex template job, and specifying them in parameters still gives the issue of terraform state not matching the labels configured on the dataflow, as per previous @n-oden comment
I think this should be considered a bug.