pulp-operator
pulp-operator copied to clipboard
Redis pod failing to write on mounted volume resulting in pulp-content returning 500
Version image: quay.io/pulp/pulp-operator:v1.0.0-beta.5 default pulp images.
Describe the bug After enabling cache, pulp-content fails with 500.
[2024-09-17 08:22:23 +0000] [52] [ERROR] Error handling request
Traceback (most recent call last):
File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_protocol.py", line 456, in _handle_request
resp = await request_handler(request)
File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_app.py", line 537, in _handle
resp = await handler(request)
File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_middlewares.py", line 114, in impl
return await handler(request)
File "/usr/local/lib/python3.9/site-packages/pulpcore/content/authentication.py", line 48, in authenticate
return await handler(request)
File "/usr/local/lib/python3.9/site-packages/pulpcore/content/instrumentation.py", line 230, in middleware
resp = await handler(request)
File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 346, in cached_function
await self.auth(request, self, bk)
File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 239, in auth_cached
await cached.set(guard_key, str(guard), base_key=base_key)
File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 57, in wrapper
return await func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 265, in set
ret = await self.redis.hset(base_key, key, value)
File "/usr/local/lib/python3.9/site-packages/redis/asyncio/client.py", line 615, in execute_command
return await conn.retry.call_with_retry(
File "/usr/local/lib/python3.9/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
return await do()
File "/usr/local/lib/python3.9/site-packages/redis/asyncio/client.py", line 589, in _send_command_parse_response
return await self.parse_response(conn, command_name, **options)
File "/usr/local/lib/python3.9/site-packages/redis/asyncio/client.py", line 636, in parse_response
response = await connection.read_response()
File "/usr/local/lib/python3.9/site-packages/redis/asyncio/connection.py", line 570, in read_response
raise response from None
redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
::ffff:10.2.3.17 [17/Sep/2024:08:22:23 +0000] "GET /pulp/content/mongo-6/tst/ HTTP/1.1" 500 335 "https://pulp3.hostname.tldpulp/content/mongo-6/" "Mozilla/5.0 (X11; Linux x86_64; rv:130.0) Gecko/20100101 Firefox/130.0"
The cache pod is failing due to unsufficient privileges when writing to volume.
$ kubectl exec pod/pulp-redis-6c86f8467-nwrbz -- /bin/ls -l /|grep data
drwxr-xr-x 3 root root 4096 Sep 16 16:49 data
1:M 17 Sep 2024 10:30:00.009 * Background saving started by pid 189536
189536:C 17 Sep 2024 10:30:00.009 # Failed opening the temp RDB file temp-189536.rdb (in server root dir /data) for saving: Permission denied
1:M 17 Sep 2024 10:30:00.110 # Background saving error
1:M 17 Sep 2024 10:30:06.096 * 1 changes in 3600 seconds. Saving...
1:M 17 Sep 2024 10:30:06.097 * Background saving started by pid 189551
189551:C 17 Sep 2024 10:30:06.098 # Failed opening the temp RDB file temp-189551.rdb (in server root dir /data) for saving: Permission denied
1:M 17 Sep 2024 10:30:06.199 # Background saving error
To enable Redis user 999 with group 999 to save on mounted storage, pod must have securityContext.fsGroup with value 999. When I'm trying to enable this by editing Pulp CR: To Reproduce set Pulp CR:
cache:
enabled: true
redis_storage_class: csi-cinder-high-speed
securityContext:
fsGroup: 999
kubectl apply -f pulp.yaml strict decoding error: unknown field "spec.cache.securityContext"
Expected behavior proper securityContext is applied and Redis is able to save RDB file.
Additional context OVH Managed Kubernetes 1.30.2
Appearantly, fsGroup should be enabled according to redis controller code here https://github.com/pulp/pulp-operator/blob/26ac1d96aa977a426e27b05cb2a8251106561b60/controllers/repo_manager/redis.go#L367
When checking actual Pod config:
$ kubectl -n pulp get pod/pulp-redis-6c86f8467-nwrbz -o json| jq -r '.spec.securityContext'
{
"runAsGroup": 999,
"runAsUser": 999
}
$ kubectl -n pulp get pod/pulp-redis-6c86f8467-nwrbz -o json| jq -r '.spec.containers.[0].securityContext'
{
"allowPrivilegeEscalation": false,
"capabilities": {
"drop": [
"ALL"
]
},
"runAsNonRoot": true,
"seccompProfile": {
"type": "RuntimeDefault"
}
}
So fsGroup defined here https://github.com/pulp/pulp-operator/blob/26ac1d96aa977a426e27b05cb2a8251106561b60/controllers/repo_manager/redis.go#L337 does not get into actual Kubernetes deployment.
Need to look into why User 999 is not allowed to write in the volume for the Redis image.
Who needs to look into this?
Who needs to look into this?
This was a reminder to us devs when we were triaging the issue.
The same issue is affecting database pods created by the pulp operator.
@vkukk @danielbakken can you please check if https://github.com/pulp/pulp-operator/pull/1434 fixes this error?
@git-hyagi We decided to drop the idea of using pulp-operator because it is too buggy and unstable. There seems to be no active development nor proper understanding how it should work. A pile of my reported bugs are still opened since last year.
I've build my own k8s configuration around Pulp multiprocess container and will not waste my time more on pulp-operator debugging and testing.
@git-hyagi We also dropped our plans to switch from Ansible Pulp to pulp-operator. We built our own simple RPM repository service instead.
This bug was blocking us, the project is still in beta, and as @vkukk said does not appear to be actively developed.
Thank you for your time and feedback! I appreciate your effort in opening the issues and providing the information to get a better understanding of the problems. I am sorry to hear you are not going to use the operator anymore. Unfortunately, we didn't have much time to dedicate to the project this last year.
I am going to close this issue (considering it fixed by https://github.com/pulp/pulp-operator/pull/1434).