thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Proposal for splitting store into chunk/index store

Open harry671003 opened this issue 2 years ago • 5 comments

  • [ ] I added CHANGELOG entry for this change.
  • [X] Change is not relevant to the end user.

Changes

  • Adding a proposal for splitting store into index and chunk store.
  • Related issue: https://github.com/thanos-io/thanos/issues/6894

harry671003 avatar Nov 14 '23 19:11 harry671003

It would be great if we can get some 👀 on this proposal and hear feedbacks from the community.

Maybe @MichaHoffmann @saswatamcode @GiedriusS @fpetkovski?

yeya24 avatar Nov 17 '23 18:11 yeya24

Thanks for writing up the proposal, I agree that separating labels and chunks makes sense. However, I am not sure if that should be done through adding a separate store component. Have we considered sending chunk refs through the Series API so that the querier can fetch data directly from object storage?

fpetkovski avatar Nov 20 '23 12:11 fpetkovski

I will try to comment here. @harry671003 feel free to correct me if I was wrong.

For example, perhaps it even makes sense to implement these two Store modes as separate subcommands?

Yeah, I think that's possible to do. As long as we can make sure to not duplicate too much code from Store Gateway. There are pros and cons between separate commands and using only 1 command.

Maybe something like MatchRequest and MatchResponse sounds better?

We can totally make the naming change if we agree MatchRequest is better. It makes sense to use another name other than Select.

I guess another question is about how chunks store will work with the external labels functionality that we have. Perhaps the chunks store could also advertise some external labelsets so that it would be possible to choose which chunk store(-s) should be used for requests depending on the matchers inside of the user's query?

IIUC, external labels functionality is part of the blocks matching process? I think deciding which blocks to query is done at Index Gateway. Chunks gateway is pure stateless and it just downloads chunks data based on given chunk ref and block ID.

Since the chunk size is capped AFAICT, it probably makes sense to make to avoid streaming altogether here 🤔 maybe worth highlighting this in the proposal?

Can you please elaborate this a little bit more? I think I got the first part, chunk size is capped because of the TSDB format. But didn't get how this avoids streaming.

Have we considered sending chunk refs through the Series API so that the querier can fetch data directly from object storage?

@fpetkovski We mentioned this one in the proposal under section Querier fetches chunks directly. Maybe those points are not that strong. But I think we want to avoid changing existing Series API in this proposal to reduce complexity. But happy to discuss more!

yeya24 avatar Nov 22 '23 08:11 yeya24

Can we get another round of review? @fpetkovski @GiedriusS @MichaHoffmann

yeya24 avatar Dec 04 '23 18:12 yeya24

Kindly ping for another review. @GiedriusS @fpetkovski @MichaHoffmann

yeya24 avatar Jan 07 '24 14:01 yeya24