thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Query Frontend: add query instant tripperware for query sharding

Open yeya24 opened this issue 2 years ago • 7 comments

Signed-off-by: Ben Ye [email protected]

  • [ ] I added CHANGELOG entry for this change.
  • [ ] Change is not relevant to the end user.

Changes

This pr adds a simple tripperware for sharding instant queries.

Verification

yeya24 avatar Aug 02 '22 07:08 yeya24

One question, do you think it makes sense to have separate sharding number for query range and query instant? It would mean we change --query-frontend.vertical-shards to

--query-range.vertical-shards
--query-instant.vertical-shards

which is a "breaking change", but we also haven't released the feature yet.

fpetkovski avatar Aug 02 '22 16:08 fpetkovski

One question, do you think it makes sense to have separate sharding number for query range and query instant? It would mean we change --query-frontend.vertical-shards to

--query-range.vertical-shards
--query-instant.vertical-shards

which is a "breaking change", but we also haven't released the feature yet.

What about using the same NumShards flag?

type Config struct {
	QueryRangeConfig
	LabelsConfig
	DownstreamTripperConfig

	CortexHandlerConfig    *transport.HandlerConfig
	CompressResponses      bool
	CacheCompression       string
	RequestLoggingDecision string
	DownstreamURL          string
	ForwardHeaders         []string
	NumShards              int
}

Now the NumShards is defined in the root level instead of inside QueryRangeConfig so I think it is totally okay to use it in instant queries. Maintaining two flags are also okay but I cannot think of a usecase for setting vertical shards differently for different query types.

yeya24 avatar Aug 02 '22 16:08 yeya24

Got it. Let's start with a single flag and we can revisit if a use case comes up.

fpetkovski avatar Aug 02 '22 16:08 fpetkovski

After we vendor the cortex package internally, is there a way to import the cortex proto file we vendored? Tried multiple times but didn't find a way to get it working with existing genproto.sh script.

yeya24 avatar Aug 03 '22 01:08 yeya24

After we vendor the cortex package internally, is there a way to import the cortex proto file we vendored? Tried multiple times but didn't find a way to get it working with existing genproto.sh script.

Yeah, I think it hasn't been updated. You probably need to add -I option that points to the Cortex directory.

GiedriusS avatar Aug 03 '22 07:08 GiedriusS

Still cannot get it working by adding the Cortex dir to the proto dependency. Wondering what's the future plan on this. Are we allowed to modify the vendored Cortex code for some use cases? We may need to build the Cortex proto ourselves.

yeya24 avatar Aug 09 '22 07:08 yeya24

Could you share the command that fails for you currently?

fpetkovski avatar Aug 09 '22 07:08 fpetkovski

Could you share the command that fails for you currently?

Thanks. I figured out a way to compile the Cortex protobuf instead and make changes on it directly. Not sure if it is a acceptable way but I got it working finally. Seems query response parsing and instant query sharding are working properly on my local tests.

I will add more tests soon.

yeya24 avatar Aug 13 '22 08:08 yeya24

This is ready for another round of review. @fpetkovski @GiedriusS

yeya24 avatar Aug 15 '22 16:08 yeya24

This looks good, my only concern is that we need to modify the vendored Cortex files. I am not sure though what's the strategy for keeping up to date with upstream Cortex.

Yeah, I see your concern. I am not sure about this as well. If needed I can try to avoid modifying the Cortex code. But this causes more code duplication. For example, we need to copy Cortex unexposed functions like merging query stats, etc.

yeya24 avatar Aug 17 '22 03:08 yeya24

Even though it's not ideal for syncing up with the upstream Cortex, we'll need to modify the vendored Cortex files. If I'm not mistaken, @GiedriusS also working on a PR that modifies the vendored files. It'll be a half-automated process to upgrade Cortex in Thanos, but I don't see any other options. If someone has a brilliant idea to do this, please let us know :)

kakkoyun avatar Aug 17 '22 04:08 kakkoyun

Amazing stuff, one of our instant queries went from 36s to 8s.

fpetkovski avatar Aug 18 '22 06:08 fpetkovski

Whoops, I didn't expect the PR got merged so quickly. I haven't added some unit tests and changelog, too. I will add them in follow up prs.

One important thing I want to make sure is to have PromQL compatibility test for Thanos Query behind the query frontend to make sure no regression. Since the pr is already large I will do it in another pr, too.

yeya24 avatar Aug 18 '22 06:08 yeya24