terraform-provider-azapi Feature request: Support JSMEPath in `output_export

Currently we need to use the Terraform functions to filter/manipulate the results from the output field. But the output always store the whole response, which increase the state file size.

The JSMEPath is a query language for JSON. refs: https://jmespath.org

There's a concern about backward compatibility, take the following json as an example:

{
  "locations": {
     "foo": "bar"
  }
}

In the current design, when user specifies the locations.foo, the output will be

{
  "locations": {
     "foo": "bar"
  }
}

But with JSMEPath, the result will be

bar

To avoid breaking changes, we might need to add a prefix for the JSMEPath query.

Jun 20 '24 03:06 ms-henglu

Awesome. Commenting to follow.

Approximately how far away is milestone v2.0.0 ?

Guessing more than one year ?

Jun 20 '24 13:06 DevopsMercenary

Love that we're doing this. Following this issue to see how the progress goes. We might be able to create a provider flag @ms-henglu to toggle on/off whether the state file should store all the output and default it to not storing. Something along the lines of store_output_in_state = false?

Jun 20 '24 16:06 stemaMSFT

Hi @stemaMSFT - I think it's not possible to allow user use the data and not store in state at same time.

Jun 21 '24 03:06 ms-henglu

@ms-henglu if we tie the behavior of the provider behind a flag, it will still need the full data? Could I not store the smaller data overhead in state, change the flag, and then let the provider fetch that data?

Jun 25 '24 01:06 stemaMSFT

There's a problem about combing multiple JMES query results.

For example, the task is retrieving the IDs and names of the automation accounts.

Solution 1: The current way is using azapi_resource_list to list all accounts and export everything in the response, then use HCL to query the data:


data "azapi_resource_list" "test" {
  type                   = "Microsoft.Automation/automationAccounts@2023-11-01"
  parent_id              = "/subscriptions/subscription_id"
  response_export_values = ["value"]
}

output "ids" {
  value = data.azapi_resource_list.test.output.value[*].id
}

output "names" {
  value = data.azapi_resource_list.test.output.value[*].name
}

But this solution stores everything in the response which makes the terraform state extremely large. So we introduce the JMES feature.

Solution 2:

data "azapi_resource_list" "test" {
  type                   = "Microsoft.Automation/automationAccounts@2023-11-01"
  parent_id              = "/subscriptions/subscription_id"
  response_export_values = ["jmes:value[*].name", "jmes:value[*].id"]
}

The problem is how to combine the multiple query results.

There're some options:

Option 1:

output = {
  "jmes:value[*].name" = ["name1", "name2"]
  "jmes:value[*].id" = ["id1", "id2"]
}

It's difficult to reference the values, for example, user needs to use ....output."jmes:value[*].name" to get the names. And when they update the expression, two places need to be updated.

Option 2:

output = {
  "jmes1" = ["name1", "name2"]
  "jmes2" = ["id1", "id2"]
}

It uses jmes1 and jmes2 as the keys, so the user could use ....output.jmes1 to reference the first jmes query result. However, when the user adds a new expression at the beginning of the response_export_values, the indexes are changed too.

Option 3:

Only support one jmes expression in response_export_values.

output = {
  "jmes" = ["name1", "name2"]
}

The JMES expression is powerful, I think supporting only one expression should be enough and in the future, it's still possible to use option 2.

Hi @stemaMSFT , @grayzu , this is the summary of the issue mentioned today, WDYT?

Jul 31 '24 06:07 ms-henglu

Adding to above, Option 4: We can have one single key "jmes" mapped to a list of all values referenced by jmes. e.g.

output = {
  "jmes" = [["name1","name2"], ["id1","id2"]]
}

When referencing in the output, we can use the index within this list to get needed data.

Aug 06 '24 05:08 hqhqhqhqhqhqhqhqhqhqhq

My thoughts on the matter, we could use a map[string] in a new input, and map[any] output attributes (to avoid breaking changes). The input map key identifies the query in the output, the map value is the JMES query:

data "azapi_resource_list" "test" {
  type                   = "Microsoft.Automation/automationAccounts@2023-11-01"
  parent_id              = "/subscriptions/subscription_id"
  response_queries = {
    names = "value[*].name"
    ids   = "value[*].id"
  }
}

output "ids" {
  value = data.azapi_resource_list.test.response_query_results.names
}

output "names" {
  value = data.azapi_resource_list.test.response_query_results.ids
}

JMES vs gjson

Please can I make an impassioned plea to use gjson as I think it's a much better choice:

accessing array members:

mylist.1.value

Equivalent splat notation:

mylist.#.name -> ["name1", "name2"]

https://github.com/tidwall/gjson/blob/master/SYNTAX.md

Aug 07 '24 12:08 matt-FFFFFF

@ms-henglu can you take a look at gjson? I’ll also take a look. Thanks for the idea @matt-FFFFFF

Aug 07 '24 15:08 stemaMSFT

Thanks @matt-FFFFFF , this is fantastic!

@stemaMSFT , about gjson vs jsme path, I think I prefer jsme path, because the azure cli also uses JSME path to query the results. refs: https://learn.microsoft.com/en-us/cli/azure/use-azure-cli-successfully-query?tabs=concepts%2Cbash

I like Matt's idea and the azure cli uses a similar design: https://learn.microsoft.com/en-us/cli/azure/use-azure-cli-successfully-query?tabs=concepts%2Cbash#rename-properties-in-a-query. I think we can keep/deprecate the old response_export_values and introduce this new response_queries. WDYT? @stemaMSFT

cc @hqhqhqhqhqhqhqhqhqhqhq

Aug 08 '24 04:08 ms-henglu

@ms-henglu

Will the solution still mark sensitive values as such?

Aug 15 '24 12:08 matt-FFFFFF

Hi @matt-FFFFFF - You mean mark the credentials in the output as sensitive? No, because the sensitive couldn't be set conditionally.

Refs: https://github.com/hashicorp/terraform-plugin-sdk/issues/736

Aug 16 '24 02:08 ms-henglu

Hi @matt-FFFFFF @stemaMSFT , also wanted to clarify about the final design. Should we:

keep response_export_values, and make the type dynamic so it can receive a list or map as value. Does both types of value support jmes, or does only the map value support jmes? Or
keep response_export_values as it is (no jmes), also add response_queries property that supports jmes.

My opinion is option 2, because the code for it will be cleaner & doesn't change the current behaviour What are your thoughts? cc @ms-henglu

Aug 16 '24 02:08 hqhqhqhqhqhqhqhqhqhqhq

Hi @hqhqhqhqhqhqhqhqhqhqhq , I just talked with @stemaMSFT , and for the 2.0 beta release, we'll go with option 1.

Keep response_export_values, and make the type dynamic so it can receive a list or map as value.
If the user uses the array, the behavior will be consistent with before and no jmes support for this case.
If the user uses the map, all values will be JMES expression, so we don't need to add the jmes: prefix.

Thanks

Aug 16 '24 05:08 ms-henglu

@ms-henglu @stemaMSFT regarding jmes vs. gjson, etc... have you considered JSONPath? currently its only one what has any JSON querying RFC, look for RFC9535.

Aug 17 '24 02:08 DariuszPorowski

Thanks @DariuszPorowski for the suggestion! But we decided to use the JSME Path because it's also used in azure cli.

Aug 30 '24 04:08 ms-henglu

This feature has been supported by https://github.com/Azure/terraform-provider-azapi/pull/591

Aug 30 '24 04:08 ms-henglu

Feature request: Support JSMEPath in `output_export_values`

JMES vs gjson