elasticsearch
elasticsearch copied to clipboard
[ML] Enable chunking when performing an inference through the Inference API
Description
Currently the Inference API only runs chunking when called as part of ingesting a large document into an index with an inference field. This change would allow users to run chunking when calling the inference API to perform an inference. The user can control whether chunking is run using an optional flag that will be added to the inference request (chunking_enabled) that will default to false.
Pinging @elastic/ml-core (Team:ML)