thanos
thanos copied to clipboard
Proposal for splitting store into chunk/index store
- [ ] I added CHANGELOG entry for this change.
- [X] Change is not relevant to the end user.
Changes
- Adding a proposal for splitting store into index and chunk store.
- Related issue: https://github.com/thanos-io/thanos/issues/6894
It would be great if we can get some 👀 on this proposal and hear feedbacks from the community.
Maybe @MichaHoffmann @saswatamcode @GiedriusS @fpetkovski?
Thanks for writing up the proposal, I agree that separating labels and chunks makes sense. However, I am not sure if that should be done through adding a separate store component. Have we considered sending chunk refs through the Series API so that the querier can fetch data directly from object storage?
I will try to comment here. @harry671003 feel free to correct me if I was wrong.
For example, perhaps it even makes sense to implement these two Store modes as separate subcommands?
Yeah, I think that's possible to do. As long as we can make sure to not duplicate too much code from Store Gateway. There are pros and cons between separate commands and using only 1 command.
Maybe something like MatchRequest and MatchResponse sounds better?
We can totally make the naming change if we agree MatchRequest is better. It makes sense to use another name other than Select.
I guess another question is about how chunks store will work with the external labels functionality that we have. Perhaps the chunks store could also advertise some external labelsets so that it would be possible to choose which chunk store(-s) should be used for requests depending on the matchers inside of the user's query?
IIUC, external labels functionality is part of the blocks matching process? I think deciding which blocks to query is done at Index Gateway. Chunks gateway is pure stateless and it just downloads chunks data based on given chunk ref and block ID.
Since the chunk size is capped AFAICT, it probably makes sense to make to avoid streaming altogether here 🤔 maybe worth highlighting this in the proposal?
Can you please elaborate this a little bit more? I think I got the first part, chunk size is capped because of the TSDB format. But didn't get how this avoids streaming.
Have we considered sending chunk refs through the Series API so that the querier can fetch data directly from object storage?
@fpetkovski We mentioned this one in the proposal under section Querier fetches chunks directly. Maybe those points are not that strong. But I think we want to avoid changing existing Series API in this proposal to reduce complexity. But happy to discuss more!
Can we get another round of review? @fpetkovski @GiedriusS @MichaHoffmann
Kindly ping for another review. @GiedriusS @fpetkovski @MichaHoffmann