elasticsearch
elasticsearch copied to clipboard
Support fields option on source-less indices
This adds support for fetching fields
from doc values for a few field
types:
- long
- int
- short
- byte
- double
- short
- date
- boolean
- ip
- keyword
- unsigned long
- version string
- wildcard
- scaled float
We've had support for fetching from doc values forever using the
docvalue_fields
option and this just plugs into that when the source
is disabled. It's possible that it's faster to plug into it in other
cases but we don't know those cases yet.
It does that by creating a ValueFetcherSource
which is a place where
we can attach additional fetch machinery. At this point it mostly just
has the machinery for the fields
fetch API and highlighters. But we
believe we'll want this spot for TSDB's synthetic source project. And so
we can make the fields API more efficient when fetching small numbers of
fields later.
Pinging @elastic/es-search (Team:Search)
I think this should replace #85185.
I see that the new abstraction allows consumers to pull fields from different data sources. I think it would help to have a high-level description on which component pulls what from.
Sorry for the drive-by commenting on this issue. I read it and it looks like a great addition to the current "fields" behaviour. I was wondering how this plays along with CCS and some of its bwc guarantees? The way I understand it the new behaviour returns certain fields from 8.2+ indices that we wouldn’t from earlier indices, is that so? That means if a user requests a source-less field from both a 8.2 and 8.1 cluster connected via CCS, some documents returned might not show that field. I think thats perfectly fine, I just want to raise it here as a consideration given we talked about CCS bwc a lot recently.
Sleeping on this the abstraction feels a little over designed - maybe premature. Maybe not because @romseygeek seemed to think it made sense with his plans. And it feels like it could be useful for synthetic source. OTOH, just having touched all the field mappers taught me something. So if we don't like this and want to kill it I won't feel too bad.
I see that the new abstraction allows consumers to pull fields from different data sources. I think it would help to have a high-level description on which component pulls what from.
@javanna Could you say that again with other words? I sort of half understand what you are getting at and when I try and fill in the blank I worry I'll get your meaning wrong.
I was wondering how this plays along with CCS and some of its bwc guarantees? The way I understand it the new behaviour returns certain fields from 8.2+ indices that we wouldn’t from earlier indices, is that so? That means if a user requests a source-less field from both a 8.2 and 8.1 cluster connected via CCS, some documents returned might not show that field. I think thats perfectly fine, I just want to raise it here as a consideration given we talked about CCS bwc a lot recently.
@cbuescher that's a really good question! One I hadn't thought about at all. Maybe it's more ok because it's on the data side? Meaning we'd be more willing to go source-less on indices in whatever version of Elasticsearch has this. And indices on versions that don't have it will still have _source? I dunno. You know a lot more about this stuff than I do.
Hi @nik9000, I've created a changelog YAML for you.
Could you say that again with other words? I sort of half understand what you are getting at and when I try and fill in the blank I worry I'll get your meaning wrong.
sorry @nik9000 for being cryptic. I meant that I could use a summary in the description of the PR about the three different supported scenarios (the three methods in the new abstraction) and what the consumers are for each one.
Sorry for my own drive-by comment, I found the logic in the last PR (#85185) a lot more focused and easy to understand. Would it make sense to start with that approach, then later see if we need this flexibility and "machinery" as your TSDB research evolves?
Sorry for my own drive-by comment, I found the logic in the last PR (#85185) a lot more focused and easy to understand. Would it make sense to start with that approach, then later see if we need this flexibility and "machinery" as your TSDB research evolves?
I'd be fine with that. As of today I don't think tsdb is going to need this directly.
This all might be more useful once we have synthetic source. Then you could preferDocValues
to avoid having to load _source at all.
@nik9000 shall we close this?
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Yeah. ESQL does this pretty effectively as this point. I wasn't going to get this up to date with main
.