helm-charts
helm-charts copied to clipboard
Error when upgrade from 0.13.1 to latest version
What happened? After upgrade, all my nodes return this error :
2023-01-20 10:15:50,010 ERROR: create_config_service failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 890, in _create_config_service
if not self._api.create_namespaced_service(self._namespace, body):
File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 468, in wrapper
return getattr(self._core_v1_api, func)(*args, **kwargs)
File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 404, in wrapper
return self._api_client.call_api(method, path, headers, body, **kwargs)
File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 373, in call_api
return self._handle_server_response(response, _preload_content)
File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 203, in _handle_server_response
raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e47d519f-f244-47a4-ad9f-201cbe928c4a', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f182df09-64d8-40dd-8faa-445be223320d', 'Date': 'Fri, 20 Jan 2023 10:15:50 GMT', 'Content-Length': '300'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services is forbidden: User \\"system:serviceaccount:opennms:timescaledb\\" cannot create resource \\"services\\" in API group \\"\\" in the namespace \\"stage\\"","reason":"Forbidden","details":{"kind":"services"},"code":403}\n'
Did you expect to see something different?
Yes, no error in my timescaledb nodes
How to reproduce it (as minimally and precisely as possible):
- Use this timescaledb-ha image :
pg12.13-ts2.9.1-latest
- Install the chart timescaledb-single in version 0.13.0
- try to upgrade to latest or others 0.1X.X
- And voilà ! You have same error in nodes
Environment
- Which helm chart and what version are you using?
TimescaleDB-single in 0.13.1 at start and I try to upgrade to 0.30.0
- What is in your
values.yaml
?
affinity: {}
backup:
enabled: true
env: null
envFrom: null
jobs:
- name: full-weekly
schedule: 12 02 * * 0
type: full
- name: incremental-daily
schedule: 12 02 * * 1-6
type: incr
pgBackRest:
compress-type: lz4
process-max: 4
repo1-cipher-type: none
repo1-retention-diff: 2
repo1-retention-full: 2
repo1-s3-endpoint: s3.amazonaws.com
repo1-s3-region: eu-west-1
repo1-type: s3
start-fast: 'y'
pgBackRest:archive-get: {}
pgBackRest:archive-push: {}
resources: {}
bootstrapFromBackup:
enabled: false
repo1-path: null
secretName: pgbackrest-bootstrap
callbacks:
configMap: null
clusterName: stage
debug:
execStartPre: null
env:
- name: TIMESCALEDB_TELEMETRY
value: 'off'
envFrom: null
fullnameOverride: '{{ .Release.Name }}'
image:
pullPolicy: Always
repository: timescale/timescaledb-ha
tag: pg12.13-ts2.9.1-latest
networkPolicy:
enabled: false
ingress: null
prometheusApp: prometheus
nodeSelector: {}
patroni:
bootstrap:
dcs:
loop_wait: 10
maximum_lag_on_failover: 33554432
postgresql:
parameters:
archive_command: /etc/timescaledb/scripts/pgbackrest_archive.sh %p
archive_mode: 'on'
archive_timeout: 1800s
autovacuum_analyze_scale_factor: 0.02
autovacuum_max_workers: 10
autovacuum_naptime: 5s
autovacuum_vacuum_cost_limit: 500
autovacuum_vacuum_scale_factor: 0.05
hot_standby: 'on'
log_autovacuum_min_duration: 1min
log_checkpoints: 'on'
log_connections: 'on'
log_disconnections: 'on'
log_line_prefix: '%t [%p]: [%c-%l] %u@%d,app=%a [%e] '
log_lock_waits: 'on'
log_min_duration_statement: 1s
log_statement: ddl
max_connections: 100
max_prepared_transactions: 150
shared_preload_libraries: timescaledb,pg_stat_statements
ssl: 'on'
ssl_cert_file: /etc/certificate/tls.crt
ssl_key_file: /etc/certificate/tls.key
tcp_keepalives_idle: 900
tcp_keepalives_interval: 100
temp_file_limit: 1GB
timescaledb.passfile: ../.pgpass
unix_socket_directories: /var/run/postgresql
unix_socket_permissions: '0750'
wal_level: hot_standby
wal_log_hints: 'on'
max_locks_per_transaction: 2200
max_parallel_workers: 14
max_worker_processes: 32
timescaledb.max_background_workers: 16
use_pg_rewind: true
use_slots: true
retry_timeout: 10
ttl: 30
method: restore_or_initdb
post_init: /etc/timescaledb/scripts/post_init.sh
restore_or_initdb:
command: >
/etc/timescaledb/scripts/restore_or_initdb.sh --encoding=UTF8
--locale=C.UTF-8
keep_existing_recovery_conf: true
kubernetes:
role_label: role
scope_label: cluster-name
use_endpoints: true
log:
level: WARNING
postgresql:
authentication:
replication:
username: standby
superuser:
username: postgres
basebackup:
- waldir: /var/lib/postgresql/wal/pg_wal
callbacks:
on_reload: /etc/timescaledb/scripts/patroni_callback.sh
on_restart: /etc/timescaledb/scripts/patroni_callback.sh
on_role_change: /etc/timescaledb/scripts/patroni_callback.sh
on_start: /etc/timescaledb/scripts/patroni_callback.sh
on_stop: /etc/timescaledb/scripts/patroni_callback.sh
create_replica_methods:
- pgbackrest
- basebackup
listen: 0.0.0.0:5432
pg_hba:
- local all postgres peer
- local all all md5
- hostssl all all 127.0.0.1/32 md5
- hostssl all all ::1/128 md5
- hostssl replication standby all md5
- hostssl all all all md5
- host all all all md5
pgbackrest:
command: /etc/timescaledb/scripts/pgbackrest_restore.sh
keep_data: true
no_master: true
no_params: true
recovery_conf:
restore_command: /etc/timescaledb/scripts/pgbackrest_archive_get.sh %f "%p"
use_unix_socket: true
restapi:
listen: 0.0.0.0:8008
persistentVolumes:
data:
accessModes:
- ReadWriteOnce
annotations: {}
enabled: true
mountPath: /var/lib/postgresql
size: 25Gi
subPath: ''
wal:
accessModes:
- ReadWriteOnce
annotations: {}
enabled: true
mountPath: /var/lib/postgresql/wal
size: 5Gi
storageClass: null
subPath: ''
pgBouncer:
config:
default_pool_size: 12
max_client_conn: 500
pool_mode: transaction
server_reset_query: DISCARD ALL
client_tls_sslmode: prefer
ignore_startup_parameters: extra_float_digits
enabled: true
pg_hba:
- local all postgres peer
- host all postgres,standby 0.0.0.0/0 reject
- host all postgres,standby ::0/0 reject
- hostssl all all 0.0.0.0/0 md5
- hostssl all all ::0/0 md5
- host all all 0.0.0.0/0 md5
- host all all ::0/0 md5
port: 6432
userListSecretName: null
podAnnotations: {}
podLabels: {}
podManagementPolicy: OrderedReady
podMonitor:
enabled: false
interval: 10s
path: /metrics
postInit:
- configMap:
name: custom-init-scripts
optional: true
- secret:
name: custom-secret-scripts
optional: true
prometheus:
args: []
enabled: false
env: null
image:
pullPolicy: Always
repository: quay.io/prometheuscommunity/postgres-exporter
tag: v0.11.1
volumeMounts: null
volumes: null
rbac:
create: true
readinessProbe:
enabled: true
failureThreshold: 6
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
replicaCount: 3
resources: {}
secrets:
certificate:
tls.crt: ''
tls.key: ''
certificateSecretName: certificate
credentials:
PATRONI_REPLICATION_PASSWORD: ''
PATRONI_SUPERUSER_PASSWORD: ''
PATRONI_admin_PASSWORD: ''
credentialsSecretName: credentials
pgbackrest:
PGBACKREST_REPO1_S3_BUCKET: ''
PGBACKREST_REPO1_S3_ENDPOINT: s3.amazonaws.com
PGBACKREST_REPO1_S3_KEY: ''
PGBACKREST_REPO1_S3_KEY_SECRET: ''
PGBACKREST_REPO1_S3_REGION: ''
pgbackrestSecretName: pgbackrest
service:
primary:
annotations:
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '4000'
service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
service.beta.kubernetes.io/aws-load-balancer-type: nlb
labels: {}
nodePort: null
port: 5432
spec: {}
type: LoadBalancer
replica:
annotations:
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '4000'
service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
service.beta.kubernetes.io/aws-load-balancer-type: nlb
labels: {}
nodePort: null
port: 5432
spec: {}
type: LoadBalancer
serviceAccount:
annotations: {}
create: true
name: null
sharedMemory:
useMount: true
timescaledbTune:
args: {}
enabled: true
tolerations: []
topologySpreadConstraints: []
version: null
global:
cattle:
systemProjectId: p-rg64l
loadBalancer:
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
service.beta.kubernetes.io/aws-load-balancer-type: nlb
enabled: false
replicaLoadBalancer:
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
service.beta.kubernetes.io/aws-load-balancer-type: nlb
-
Kubernetes version information:
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.8", GitCommit:"4a3b558c52eb6995b3c5c1db5e54111bd0645a64", GitTreeState:"clean", BuildDate:"2021-12-15T14:52:11Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.15-eks-fb459a0", GitCommit:"be82fa628e60d024275efaa239bfe53a9119c2d9", GitTreeState:"clean", BuildDate:"2022-10-24T20:33:23Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
-
Kubernetes cluster kind:
insert how you created your cluster: Rancher
Anything else we need to know?: I saw the same error here : https://github.com/timescale/helm-charts/issues/405 But the thread is marked as resolved. But no real solution is proposed to resolve this thread.
I saw the problem is with patroni but I don't Know if timescaledb-ha images using only postgres 12 version are patched
This error is a Patroni error that is fixed here. That Patroni fix was merged into the timescaledb-ha image version timescale/timescaledb-ha:pg13.9-ts2.9.2-p0
. You will need to update your timescaledb-ha image for that message to go away.
@nhudson thanks I test that. This image have the postgres 12 version on it ? (settings the correct parameter in the value.yaml
)
The same error here (with the newest version of the chart (0.33.1
)). I did not want to use other image, so I had to create Role
and RoleBinding
with these values:
cat <<EOF | kubectl apply --context <your_context> -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: <your_namespace>
name: timescaledb-patch-to-remove
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["services"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: timescaledb-patch-binding
namespace: <your_namespace>
subjects:
- kind: ServiceAccount
name: <your_name>
namespace: <your_namespace>
roleRef:
kind: Role
name: timescaledb-patch-to-remove
apiGroup: rbac.authorization.k8s.io
EOF
I had the same issue with
repository: timescale/timescaledb-ha
tag: pg12.13-ts2.9.2-p1
It was fixed by @tomislater patch
@nhudson It's not normal to use a specific image with a patch not deployed in all new timescaledb-ha tags. Do we have a date or something for when and if this will be rolled out to all future timescaledb-ha tags?
Same error with pg14.6-ts2.9.2-p0 and pg14.6-ts2.9.1-patroni-static-primary-p2. Any image on pg14.6 fixing this problem?
Any news about the publication of patroni patch in all new timescaledb-ha images without apply patch or use very specific timescaledb-ha image ? @nhudson
Same error with timescaledb-ha:pg12.14-ts2.9.3-latest
published 4 days ago 😢
@tomislater patch fix the problem. But it's not a solution : normally, with a helm chart, we do not to do this patches.
It's possible to add this directly in the chart to fix the problem in next version ? @feikesteenbergen
@throrin19 looks like we are waiting on the Patroni images to become available in the Debian repo. https://github.com/timescale/timescaledb-docker-ha/pull/343
Is there an update date of patroni in the image timescaledb-ha since a month?
This problem seems to be blocking for most of the current and future users of this chart but no solution seems to arrive quickly (except the role tweak which is to be done in manual but not available on the chart itself).
I tried with the latest image that is currently available (pg14.6-ts2.9.3-patroni-dcs-failsafe-p0
) and the problem is resolved!
@GeorgFleig : It's not a final image, it is a specific tag with the fix. Ok it resolves the problem but it's not the solution, the solution is : with default tags used in this charts, patroni works fine without patch in kubernetes
Still getting this error in the timescaledb pod logs.
Any news on when the container images that have the patroni fix will be referenced in the helm chart?
@pfrydids You can forgot about this, the project seems not maintained...
@throrin19 oh no! A real shame as it seems to be working quite well otherwise
@pfrydids I use it in my organisation, but nobody have response to their issues like 6 months.
Well that's a shame. I was integrating this into my cluster and saw this issue occurring but if the project is dead I'll have to look for some alternative. :(
@DreamwareDevelopment you can fix that with the patch of roles and rolebindings published by @tomislater https://github.com/timescale/helm-charts/issues/554#issuecomment-1406256190