vespa
vespa copied to clipboard
Only evaluate ONNX models once in stateless model eval
@bratseth Draft PR. For discussion. Ref https://github.com/vespa-engine/vespa/issues/19901.
To avoid breaking existing API, this adds a new MultiFunctionEvaluator
which ensures that ONNX models are only evaluated once for all outputs. Example use:
MultiFunctionEvaluator eval = modelsEvaluator.multiEvaluatorOf("mul"); // contains two outputs
Map<String, Tensor> out = eval.bind("input1", input1).bind("input2", input2).evaluate();
assertEquals(6.0, out.get("output1").sum().asDouble(), 1e-9);
assertEquals(5.0, out.get("output2").sum().asDouble(), 1e-9);
Especially the name MultiFunctionEvaluator
is up for discussion.
Whether it is "multi" is a property of the function so it seems wrong to make the caller choose?
Since a FunctionEvaluator is a single-use object with a life-cycle how about storing all outputs on it instead, so you could do
FunctionEvaluator eval = modelsEvaluator.evaluatorOf("mul"); // contains two outputs
eval.bind("input1", input1).bind("input2", input2).evaluate();
assertEquals(6.0, eval.result("output1").sum().asDouble(), 1e-9);
assertEquals(5.0, eval.result("output2").sum().asDouble(), 1e-9);
We could just return null from evaluate if there is no default output, so that the return from evaluate is just "returns the default output, named the same as the function, if any".
So, now, when calling:
FunctionEvaluator eval = modelsEvaluator.evaluatorOf("mul"); // contains two outputs
The FunctionEvaluator
can contain multiple functions. When this is evaluated, it will evaluate all functions and add to the results
so they can be retrieved independently, like your code suggestion above. Most (non-ONNX) models will contain only a single function, this will be returned as the result of evaluate
. We don't currently add the output to the contexts, as more work would be required to solve which function to evaluate first etc for this to be useable.
For ONNX models that have multiple outputs, the FunctionEvaluater
will contain a function for each. However, before they are evaluated, we scan through the context(s) and evaluate each ONNX model once, in evaluateOnnxModels
. Those outputs are added to the contexts, ensuring any model is only evaluated once.
Yes, so you have a goal of doing the same also for regular functions?
I hadn't planned on this for this iteration, but this might be something we could consider when/if the need arises?
But why support multiple regular functions within one evaluator then?
Because each output of a ONNX model output has their own ExpressionFunction
, like this: onnx(model_name).output_name
. When this is evaluated, this value is already inserted into the context from the ONNX model evaluation.
Got it. But should we capture this in the model instead of creating an evaluator having (and always evaluating) every function of the model?
We could add a List of outputs to ExpressionFunction?
By model here you mean the ONNX model (model is such an overloaded term atm)? I don't think the overhead of always evaluating the entire model is practically that large: we do that in Proton today. However, we could do something like:
FunctionEvaluator eval = modelsEvaluator.evaluatorOf("mul", List.of("output1", "output2));
To specify exactly which outputs one would like. If no outputs are passed, all outputs are calculated.
This might make the evaluatorOf
function slightly confusing as it has a String array as argument today, and passing "mul", "output1", "output2"
is rewritten to "mul", "output1.output2"
due to old support for signatures.
By model I mean the class owning these functions, ai.vespa.models.evaluation.Model. At line 185 in that you pass the entire list of functions to the FunctionEvaluator constructor if no name is given, and therefore all the functions in the model are always evaluated when that evaluator is used.
However, this is just to achieve the effect of getting all the outputs of a OnnxModel, because each one is represented as a separate ExpressionFunction, as a workaround for ExpressionFunction not supporting multiple outputs. So, I suggest it might be better to add that capability to ExpressionFunction such that we could have an isomorphic mapping between the concepts of Onnx models and our function representation (and for that matter, be able to represent rank features accurately as functions).
Rank features having multiple outputs is very close to an anti-feature. There might be use cases that might make sense, but I think is better to start off with simple ones. Most rank features use in first phase producing multiple outputs has gotten individual features computing only what is necessary.
Notwithstanding different opinions on that, it's a feature we have and will continue to have. (The performance consideration cuts both ways, some times you solve computational problem once and that produces multiple values.)
@lesters @bratseth Integrate or close ?