cortex
cortex copied to clipboard
Ruler unable to list rules when s3 bucket uses percentage encoding
Describe the bug We are trying to set up Cortex on premises and we are using a compatible s3 bucket called Hitachi Content Platform. Cortex Ruler failing to read rules on Hitachi Content Platform s3 compatible bucket. When Cortex tries to list the rulegroups it retrieves the bucket objects ( e.g. bG9raS1ub2Rlcy1ydWxlcw== on the bucket) with percent encoded characters %3D ( e.g. bG9raS1ub2Rlcy1ydWxlcw%3D%3D), this makes the decoding fail when listing rulegroups.
https://github.com/cortexproject/cortex/blob/347aacd2c836d5842db8ec972b40a26345b41d82/pkg/ruler/rulestore/bucketclient/bucket_client.go#L300
to reproduce the issue in the code I wrote this test.
package main
import (
"encoding/base64"
"fmt"
)
func main() {
decodedNamespace, err := base64.URLEncoding.DecodeString("bG9raS1ub2Rlcy1ydWxlcw%3D%3D")//%3D%3D
encoded := base64.URLEncoding.EncodeToString([]byte("loki-nodes-rules"))
decoded, err2 := base64.URLEncoding.DecodeString(encoded)
fmt.Println(string(decodedNamespace), err)
fmt.Println(string(decoded), err2)
}
loki-nodes-rule illegal base64 data at input byte 22
loki-nodes-rules <nil>
To Reproduce Steps to reproduce the behavior:
- Start Cortex 1.11.1 with single-process-config-blocks.yaml
- set up a HCP bucket in the ruler
- upload a sample rule
./cortextool rules load ~/notes/paas/cortex-rules-alerts/ruler/loki-nodes-rules.yaml --address=http://<url>:9008 --id=nap-tom
- Check logs for errors coming from
bucket_client.go
( check below fro the log I received)
config.yaml
# Configuration for running Cortex in single-process mode.
# This should not be used in production. It is only for getting started
# and development.
# Disable the requirement that every request to Cortex has a
# X-Scope-OrgID header. `fake` will be substituted in instead.
auth_enabled: false
server:
http_listen_port: 9008
grpc_listen_port: 9099
log_level: debug
# Configure the server to allow messages up to 100MB.
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
grpc_server_max_concurrent_streams: 1000
distributor:
shard_by_all_labels: true
pool:
health_check_ingesters: true
ingester_client:
grpc_client_config:
# Configure the client to allow messages up to 100MB.
max_recv_msg_size: 104857600
max_send_msg_size: 104857600
grpc_compression: gzip
ingester:
lifecycler:
# The address to advertise for this ingester. Will be autodiscovered by
# looking up address on eth0 or en0; can be specified if this fails.
# address: 127.0.0.1
interface_names: [ens160]
# We want to start immediately and flush on shutdown.
join_after: 0
min_ready_duration: 0s
final_sleep: 0s
num_tokens: 512
# Use an in memory ring store, so we don't need to launch a Consul.
ring:
kvstore:
store: inmemory
replication_factor: 1
storage:
engine: blocks
blocks_storage:
tsdb:
dir: /tmp/cortex/tsdb
bucket_store:
sync_dir: /tmp/cortex/tsdb-sync
# You can choose between local storage and Amazon S3, Google GCS and Azure storage. Each option requires additional configuration
# as shown below. All options can be configured via flags as well which might be handy for secret inputs.
backend: s3 # s3, gcs, azure or filesystem are valid options
s3:
bucket_name: eu-cortex-metrics
endpoint: url
access_key_id: "user"
secret_access_key: "password"
#insecure: true
#signature_version: "v2"
http:
insecure_skip_verify: true
compactor:
data_dir: /tmp/cortex/compactor
sharding_ring:
kvstore:
store: inmemory
frontend_worker:
match_max_concurrent: true
ruler:
enable_api: true
enable_sharding: false
rule_path: /tmp/cortex/tmp-rules
ruler_storage:
backend: s3
local:
directory: /tmp/cortex/rules
s3:
bucket_name: eu-cortex-ruler
endpoint: url
access_key_id: "user"
secret_access_key: "password"
#insecure: true
#signature_version: "v2"
http:
insecure_skip_verify: true
EOF
loki-nodes-rules.yaml
groups:
- name: loki-nodes
rules:
- alert: loki-up
expr: up{application="loki"} == 1
labels:
severity: MAJOR
annotations:
description: "Loki is not running on {{ $labels.hostname }}"
Those are the logs that I was receiving:
level=warn ts=2022-04-15T16:21:04.619726789Z caller=bucket_client.go:147 msg="invalid rule group object key found while listing rule groups" user=fake key=bG9raS1ub2Rlcy1ydWxlcw%3D%3D/ err="illegal base64 data at input byte 22"
Expected behavior Not encounter any error and have the ruler be able to list the rules
Environment:
- Infrastructure: VMs
- Deployment tool: manual
Storage Engine
- [X] Blocks
- [ ] Chunks
Additional Context
What is this bG9raS1ub2Rlcy1ydWxlcw==
object?
What is this
bG9raS1ub2Rlcy1ydWxlcw==
object?
@alanprot That is the namespace encoded in base64, it corresponds to the filename of the rule file I was trying to upload to cortex. In the bucket that's a folder that contains the rulegroup
Oh Ok..
So basically for some reason the "Hitachi Content Platform" is encoding the response?
bG9raS1ub2Rlcy1ydWxlcw==
to bG9raS1ub2Rlcy1ydWxlcw%3D%3D
So i guess the question is.. why this hitachi is encoding the response?
@alanprot
I checked to see if the issue would persist when when defining the s3 config inside the ruler:
block and here seems to be working.
example:
auth_enabled: true
server:
http_listen_port: 9008
grpc_listen_port: 9099
log_level: debug
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
grpc_server_max_concurrent_streams: 1000
distributor:
shard_by_all_labels: true
pool:
health_check_ingesters: true
ingester_client:
grpc_client_config:
max_recv_msg_size: 104857600
max_send_msg_size: 104857600
grpc_compression: gzip
ingester:
lifecycler:
interface_names: [ens160]
join_after: 0
min_ready_duration: 0s
final_sleep: 0s
num_tokens: 512
ring:
kvstore:
store: inmemory
replication_factor: 1
storage:
engine: blocks
blocks_storage:
tsdb:
dir: /tmp/cortex/tsdb
bucket_store:
sync_dir: /tmp/cortex/tsdb-sync
backend: s3
s3:
bucket_name: eu-cortex-metrics
endpoint: <endpoint>
access_key_id: "<id>"
secret_access_key: "<secret>"
http:
insecure_skip_verify: true
compactor:
data_dir: /tmp/cortex/compactor
sharding_ring:
kvstore:
store: inmemory
frontend_worker:
match_max_concurrent: true
ruler:
enable_api: true
enable_sharding: false
rule_path: /tmp/cortex/tmp-rules
storage:
type: s3
s3:
bucketnames: eu-cortex-ruler
endpoint: <endpoint>
access_key_id: "<id>"
secret_access_key: "<secret>"
http_config:
insecure_skip_verify: true
I upload the rule with the same cortextool command and it doesn't give me errors
level=debug ts=2022-04-27T09:16:00.690149205Z caller=rule_store.go:147 msg="loading rule group" key="rules/nap-tom/bG9raS1ub2Rlcy1ydWxlcw==/bG9raS1ub2Rlcw==" user=nap-tom
If I switch to configuring the s3 bucket in the ruler_storage:
block
example:
auth_enabled: true
server:
http_listen_port: 9008
grpc_listen_port: 9099
log_level: debug
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
grpc_server_max_concurrent_streams: 1000
distributor:
shard_by_all_labels: true
pool:
health_check_ingesters: true
ingester_client:
grpc_client_config:
max_recv_msg_size: 104857600
max_send_msg_size: 104857600
grpc_compression: gzip
ingester:
lifecycler:
interface_names: [ens160]
join_after: 0
min_ready_duration: 0s
final_sleep: 0s
num_tokens: 512
ring:
kvstore:
store: inmemory
replication_factor: 1
storage:
engine: blocks
blocks_storage:
tsdb:
dir: /tmp/cortex/tsdb
bucket_store:
sync_dir: /tmp/cortex/tsdb-sync
backend: s3
s3:
bucket_name: eu-cortex-metrics
endpoint: <endpoint>
access_key_id: "<id>"
secret_access_key: "<secret>"
http:
insecure_skip_verify: true
compactor:
data_dir: /tmp/cortex/compactor
sharding_ring:
kvstore:
store: inmemory
frontend_worker:
match_max_concurrent: true
ruler:
enable_api: true
enable_sharding: false
rule_path: /tmp/cortex/tmp-rules
ruler_storage:
backend: s3
local:
directory: /tmp/cortex/rules
s3:
bucket_name: eu-cortex-ruler
endpoint: <endpoint>
access_key_id: "<id>"
secret_access_key: "<secret>"
http:
insecure_skip_verify: true
Those are the logs I see:
level=warn ts=2022-04-27T09:33:00.421710256Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=nap-tom/ err="invalid rule group object key"
level=warn ts=2022-04-27T09:33:00.421725842Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=nap-tom/bG9raS1ub2Rlcy1ydWxlcw%3D%3D/ err="illegal base64 data at input byte 22"
level=warn ts=2022-04-27T09:33:00.421735648Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=nap-tom/bG9raS1ub2Rlcy1ydWxlcw%3D%3D/bG9raS1ub2Rlcw%3D%3D err="illegal base64 data at input byte 22"
That looks like a Cortex issue
Hum.. Interesting..
On the first case cortex uses the AWS SDK to call S3:
https://github.com/cortexproject/cortex/blob/2177ec0c9eb6b1ceb7d8808d97945e6557055bb8/pkg/ruler/storage.go#L102 https://github.com/cortexproject/cortex/blob/2177ec0c9eb6b1ceb7d8808d97945e6557055bb8/pkg/chunk/aws/s3_storage_client.go#L382
And on the second case we are using minio
:
https://github.com/cortexproject/cortex/blob/2177ec0c9eb6b1ceb7d8808d97945e6557055bb8/pkg/ruler/storage.go#L119 https://github.com/cortexproject/cortex/blob/2177ec0c9eb6b1ceb7d8808d97945e6557055bb8/vendor/github.com/thanos-io/thanos/pkg/objstore/s3/s3.go#L247
I wonder if this explains the difference in behaviour here.
Hi all,
I'm struggling with the upload of the YAML file to s3. What is the command that you use to upload the rules to s3? Thanks
I found the way to do it: cortextool rules sync --backend=loki --id=fake --rule-files=test1.yml --address=https://<LOKI_ADDRESS>