vespa
vespa copied to clipboard
Allow multiple tensor outputs from native Vespa embedders
A promising direction is embedding models that output several representations of the input text , see M3 for context. In short; these models can output representations with just one single forward pass through the model.
The current Embedder
interface has
Tensor embed(String text, Context context, TensorType tensorType)
To allow for getting multiple representations I suggest that we add something like this?
Map<String,Tensor> embed(String text, Context context, Map<String,TensorType> tensorTypes)
Then we need to find a way to express this in IL expressions and in queries and get the representations without multiple forward passes of the model.
Those researchers keep messing up our simple and clean design :sob:
I think the best way to solve this on the IL side is to cache the outputs behind the scenes for the duration of processing of the document.
When we add to the embedder API we should maybe also let it take a list of strings to handle arrays so that we can return the complete tensors in one call instead of having to invoke multiple times from the outside and then repackage the tensors.
If we did https://github.com/vespa-engine/vespa/issues/27822 then the embedder can still just output a single tensor, but containing all the different representations. Then just slice out the one you want. This would have the added advantage of being the same for indexing and querying.
That will only work if all the representations have the same type though.