[ML] Enable chunking when performing an inference through the Inference API

Open dan-rubinstein opened this issue 1 year ago • 1 comments

Description

Currently the Inference API only runs chunking when called as part of ingesting a large document into an index with an inference field. This change would allow users to run chunking when calling the inference API to perform an inference. The user can control whether chunking is run using an optional flag that will be added to the inference request (chunking_enabled) that will default to false.

Aug 26 '24 14:08 dan-rubinstein

Pinging @elastic/ml-core (Team:ML)

Aug 26 '24 14:08 elasticsearchmachine