mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Helm <> Jsonnet: Fix read path differences

Open Logiraptor opened this issue 2 years ago • 8 comments

What this PR does

Draft PR to trigger the CI Helm <> Jsonnet diff

Which issue(s) this PR fixes or relates to

Fixes #2205

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Logiraptor avatar Sep 20 '22 18:09 Logiraptor

@dimitarvdimitrov: One of the differences is querier.max-concurrent set to 16 in Helm vs 20 (the default) in Jsonnet.

I see you added this in #2655. I see the comment

  # With query sharding we run more but smaller queries. We must strike a balance
  # which allows us to process more sharded queries in parallel when requested, but not overload
  # queriers during non-sharded queries.

Is there any more context on why 20 isn't a good value here?

Logiraptor avatar Sep 20 '22 18:09 Logiraptor

@pstibrany One difference is store-gateway.sharding-ring.wait-stability-min-duration. It defaults to 0, but is set to 1m by jsonnet. I can see that 1m used to be the default until your PR #976 moved it from a default in Go to a default in Jsonnet.

Can you explain why? I'd like to either:

  1. Revert it to being a default in the Go code and remove it from Jsonnet
  2. Add the 1m setting to Helm if it's generally useful
  3. Continue ignoring it if somehow Jsonnet is special, although this seems really unlikely

Logiraptor avatar Sep 20 '22 18:09 Logiraptor

@pracucci One difference is blocks-storage.bucket-store.max-chunk-pool-bytes. It defaults to 2GB in Helm and is overridden to 12GB in Jsonnet after your PR here: https://github.com/grafana/cortex-jsonnet/pull/322/files

Any reason I shouldn't port this into Mimir as a default in the Go code?

Logiraptor avatar Sep 20 '22 19:09 Logiraptor

blocks-storage.bucket-store.index-cache.memcached.max-item-size is 5MB in Jsonnet, 15MB in Helm. @ortuman , you most recently edited the Helm side in https://github.com/grafana/mimir/issues/2064, any idea why it's different or which one should be preferred?

Logiraptor avatar Sep 20 '22 19:09 Logiraptor

@dimitarvdimitrov: One of the differences is querier.max-concurrent set to 16 in Helm vs 20 (the default) in Jsonnet.

I see you added this in #2655. I see the comment

  # With query sharding we run more but smaller queries. We must strike a balance
  # which allows us to process more sharded queries in parallel when requested, but not overload
  # queriers during non-sharded queries.

Is there any more context on why 20 isn't a good value here?

Helm uses 16 when query sharding is enabled. Jsonnet uses the same value when query sharding is enabled. I think the difference is that jsonnet doesn't enable query sharding by default.

https://github.com/grafana/mimir/blob/74b03a6887a2354da699aef3a2353cd04de35b8b/operations/mimir/query-sharding.libsonnet#L67-L72

dimitarvdimitrov avatar Sep 21 '22 07:09 dimitarvdimitrov

@dimitarvdimitrov Thanks, I enabled it in the test now. More questions:

query-scheduler.max-outstanding-requests-per-tenant is set to 1600 in helm, but 800 in jsonnet when query sharding is enabled. Any idea which is more appropriate?

querier.max-query-parallelism is set to 224 in Helm, but 240 in Jsonnet. (14d vs 15d queries are targetted). Any reason Helm shouldn't use a 15d calculation and match?

Logiraptor avatar Sep 21 '22 19:09 Logiraptor

@dimitarvdimitrov Thanks, I enabled it in the test now. More questions:

query-scheduler.max-outstanding-requests-per-tenant is set to 1600 in helm, but 800 in jsonnet when query sharding is enabled. Any idea which is more appropriate?

I think I just didn't check the jsonnet for this one.

querier.max-query-parallelism is set to 224 in Helm, but 240 in Jsonnet. (14d vs 15d queries are targetted). Any reason Helm shouldn't use a 15d calculation and match?

I thought that 15d was not a time range that people usually run with, since Grafana offers 7d, 14d, 30d... ranges. But maybe I had overlooked some reasons for it being 15d in jsonnet.

I've opened a PR against this one to correct these numbers and set them to what jsonnet sets them to https://github.com/grafana/mimir/pull/3013. Thanks for bringing it up.

dimitarvdimitrov avatar Sep 22 '22 11:09 dimitarvdimitrov

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

pracucci avatar Oct 07 '22 09:10 pracucci