elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

Support fields option on source-less indices

Open nik9000 opened this issue 2 years ago • 10 comments

This adds support for fetching fields from doc values for a few field types:

  • long
  • int
  • short
  • byte
  • double
  • short
  • date
  • boolean
  • ip
  • keyword
  • unsigned long
  • version string
  • wildcard
  • scaled float

We've had support for fetching from doc values forever using the docvalue_fields option and this just plugs into that when the source is disabled. It's possible that it's faster to plug into it in other cases but we don't know those cases yet.

It does that by creating a ValueFetcherSource which is a place where we can attach additional fetch machinery. At this point it mostly just has the machinery for the fields fetch API and highlighters. But we believe we'll want this spot for TSDB's synthetic source project. And so we can make the fields API more efficient when fetching small numbers of fields later.

nik9000 avatar Mar 24 '22 23:03 nik9000

Pinging @elastic/es-search (Team:Search)

elasticmachine avatar Mar 29 '22 14:03 elasticmachine

I think this should replace #85185.

nik9000 avatar Mar 29 '22 14:03 nik9000

I see that the new abstraction allows consumers to pull fields from different data sources. I think it would help to have a high-level description on which component pulls what from.

javanna avatar Mar 30 '22 08:03 javanna

Sorry for the drive-by commenting on this issue. I read it and it looks like a great addition to the current "fields" behaviour. I was wondering how this plays along with CCS and some of its bwc guarantees? The way I understand it the new behaviour returns certain fields from 8.2+ indices that we wouldn’t from earlier indices, is that so? That means if a user requests a source-less field from both a 8.2 and 8.1 cluster connected via CCS, some documents returned might not show that field. I think thats perfectly fine, I just want to raise it here as a consideration given we talked about CCS bwc a lot recently.

cbuescher avatar Mar 30 '22 09:03 cbuescher

Sleeping on this the abstraction feels a little over designed - maybe premature. Maybe not because @romseygeek seemed to think it made sense with his plans. And it feels like it could be useful for synthetic source. OTOH, just having touched all the field mappers taught me something. So if we don't like this and want to kill it I won't feel too bad.

I see that the new abstraction allows consumers to pull fields from different data sources. I think it would help to have a high-level description on which component pulls what from.

@javanna Could you say that again with other words? I sort of half understand what you are getting at and when I try and fill in the blank I worry I'll get your meaning wrong.

I was wondering how this plays along with CCS and some of its bwc guarantees? The way I understand it the new behaviour returns certain fields from 8.2+ indices that we wouldn’t from earlier indices, is that so? That means if a user requests a source-less field from both a 8.2 and 8.1 cluster connected via CCS, some documents returned might not show that field. I think thats perfectly fine, I just want to raise it here as a consideration given we talked about CCS bwc a lot recently.

@cbuescher that's a really good question! One I hadn't thought about at all. Maybe it's more ok because it's on the data side? Meaning we'd be more willing to go source-less on indices in whatever version of Elasticsearch has this. And indices on versions that don't have it will still have _source? I dunno. You know a lot more about this stuff than I do.

nik9000 avatar Mar 30 '22 13:03 nik9000

Hi @nik9000, I've created a changelog YAML for you.

elasticsearchmachine avatar Mar 30 '22 17:03 elasticsearchmachine

Could you say that again with other words? I sort of half understand what you are getting at and when I try and fill in the blank I worry I'll get your meaning wrong.

sorry @nik9000 for being cryptic. I meant that I could use a summary in the description of the PR about the three different supported scenarios (the three methods in the new abstraction) and what the consumers are for each one.

javanna avatar Mar 30 '22 18:03 javanna

Sorry for my own drive-by comment, I found the logic in the last PR (#85185) a lot more focused and easy to understand. Would it make sense to start with that approach, then later see if we need this flexibility and "machinery" as your TSDB research evolves?

jtibshirani avatar Mar 30 '22 18:03 jtibshirani

Sorry for my own drive-by comment, I found the logic in the last PR (#85185) a lot more focused and easy to understand. Would it make sense to start with that approach, then later see if we need this flexibility and "machinery" as your TSDB research evolves?

I'd be fine with that. As of today I don't think tsdb is going to need this directly.

This all might be more useful once we have synthetic source. Then you could preferDocValues to avoid having to load _source at all.

nik9000 avatar Mar 30 '22 21:03 nik9000

@nik9000 shall we close this?

javanna avatar May 14 '24 14:05 javanna

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine avatar Jul 17 '24 19:07 elasticsearchmachine

Yeah. ESQL does this pretty effectively as this point. I wasn't going to get this up to date with main.

nik9000 avatar Jul 19 '24 16:07 nik9000