vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Allow multiple tensor outputs from native Vespa embedders

Open jobergum opened this issue 1 year ago • 3 comments

A promising direction is embedding models that output several representations of the input text , see M3 for context. In short; these models can output representations with just one single forward pass through the model.

The current Embedder interface has

Tensor embed(String text, Context context, TensorType tensorType)

To allow for getting multiple representations I suggest that we add something like this?

Map<String,Tensor> embed(String text, Context context, Map<String,TensorType> tensorTypes)

Then we need to find a way to express this in IL expressions and in queries and get the representations without multiple forward passes of the model.

jobergum avatar Feb 07 '24 09:02 jobergum

Those researchers keep messing up our simple and clean design :sob:

I think the best way to solve this on the IL side is to cache the outputs behind the scenes for the duration of processing of the document.

When we add to the embedder API we should maybe also let it take a list of strings to handle arrays so that we can return the complete tensors in one call instead of having to invoke multiple times from the outside and then repackage the tensors.

bratseth avatar Feb 07 '24 13:02 bratseth

If we did https://github.com/vespa-engine/vespa/issues/27822 then the embedder can still just output a single tensor, but containing all the different representations. Then just slice out the one you want. This would have the added advantage of being the same for indexing and querying.

andreer avatar Feb 20 '24 00:02 andreer

That will only work if all the representations have the same type though.

bratseth avatar Feb 20 '24 07:02 bratseth