Existing AccessPolicyTokens gets deleted and stuck on provider-grafana pod restart
When the provider-grafana pod is restarted, any AccessPolicyToken resources that are present, deletes any existing tokens, and then gets stuck attempting to recreate them.
Running Crossplane version 1.16.0, and provider-grafana 1.8.0.
Provider logs:
2024/09/06 12:08:09 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:08:09 [DEBUG] DELETE https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:08:10 [DEBUG] POST https://grafana.com/api/v1/tokens?region=
2024/09/06 12:08:11 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:08:13 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:08:17 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:08:25 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:08:42 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:09:14 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:10:14 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
2024/09/06 12:11:14 [DEBUG] GET https://grafana.com/api/v1/tokens/<token-id>?region=us
Events from the AccessPolicyToken resource:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning CannotUpdateExternalResource 6m54s managed/cloud.grafana.crossplane.io/v1alpha1, kind=accesspolicytoken failed to
update the resource: [{0 409 Conflict 409 Conflict
{
"code": "InvalidArgument",
"message": "Field is required: region",
"requestId": "9024b1f2-e154-4d74-a647-3a21182e8219"
} []}]
Warning CannotObserveExternalResource 49s (x11 over 6m53s) managed/cloud.grafana.crossplane.io/v1alpha1, kind=accesspolicy
token failed to observe the resource: [{0 error reading policy token with ID`us:d37c1ccd-8f60-4ae6-883f-33971463637a`: 404 Not
Found []}]
Example existing AccessPolicyToken object (as part of a composition):
name: grafana-api-key
base:
apiVersion: cloud.grafana.crossplane.io/v1alpha1
kind: AccessPolicyToken
spec:
forProvider:
region: us
name: foo
displayName: foo
accessPolicyId: foo
providerConfigRef:
name: default
writeConnectionSecretToRef:
name: foo
namespace: crossplane-system
This was seemingly introduced by https://github.com/grafana/crossplane-provider-grafana/pull/135
We're seeing this internally too, see https://github.com/grafana/terraform-provider-grafana/pull/1886 in an attempt to fix this.
With the latest version I'm still seeing (see the second log line):
2025/01/22 19:17:42 [DEBUG] DELETE https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:17:42 [DEBUG] POST https://grafana.com/api/v1/tokens?region=
2025/01/22 19:17:44 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:17:49 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:17:57 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:18:14 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:18:14 [DEBUG] POST https://grafana.com/api/v1/tokens?region=prod-eu-west-2
2025/01/22 19:18:14 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:18:15 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:18:47 [DEBUG] GET https://grafana.com/api/v1/tokens/51fGUID3b?region=prod-eu-west-2
2025/01/22 19:18:47 [DEBUG] POST https://grafana.com/api/v1/tokens?region=prod-eu-west-2
2025/01/22 19:18:48 [DEBUG] GET https://grafana.com/api/v1/tokens/bf879GUID5997?region=prod-eu-west-2
2025/01/22 19:18:48 [DEBUG] GET https://grafana.com/api/v1/tokens/bf879GUID5997?region=prod-eu-west-2
2025/01/22 19:18:49 [DEBUG] GET https://grafana.com/api/v1/tokens/bf879GUID5997?region=prod-eu-west-2
2025/01/22 19:20:43 [DEBUG] GET https://grafana.com/api/v1/tokens/bf879GUID5997?region=prod-eu-west-2
2025/01/22 19:20:44 [DEBUG] GET https://grafana.com/api/v1/tokens/bf879GUID5997?region=prod-eu-west-2
apiVersion: cloud.grafana.crossplane.io/v1alpha1
kind: AccessPolicyToken
metadata:
labels:
testing.upbound.io/example-name: test
name: prometheus-access-policy-token
spec:
providerConfigRef:
name: grafana-cloud-provider
forProvider:
accessPolicySelector:
matchLabels:
test.it/grafana-access-policy: prometheus
displayName: Prometheus Access Policy Token
name: prometheus-access-policy-token
region: prod-eu-west-2
writeConnectionSecretToRef:
name: prometheus-access-policy-token
namespace: grafana-cloud
---
apiVersion: cloud.grafana.crossplane.io/v1alpha1
kind: AccessPolicy
metadata:
labels:
test.it/grafana-access-policy: prometheus
name: prometheus-access-policy
spec:
providerConfigRef:
name: grafana-cloud-provider
forProvider:
displayName: Prometheus Access Policy
name: prometheus-access-policy
realm:
- identifier: "0000000" # Changed for github post
type: stack
region: prod-eu-west-2
scopes:
- logs:write
The Loki pods give:
ts=2025-01-22T18:49:04.582609283Z level=error msg="final error sending batch" component_path=/ component_id=loki.write.hostedlogs component=client host=logs-prod-012.grafana.net status=401 tenant="" error="server returned HTTP status 401 Unauthorized (401): {"status":"error","error":"authentication error: legacy auth cannot be upgraded because the host is not found"}"
I'm going to revert https://github.com/grafana/crossplane-provider-grafana/pull/135 as it causes a new bug instead of solving it.