spring-ai
spring-ai copied to clipboard
Not able to call Embedding API with different sentence length in List
I am trying to use embedding API with this code:
List<List<Double>> embeddings = embeddingModel.embed(List.of("Hello world", "How are you?"));
Getting:
ai.onnxruntime.OrtException: Supplied array is ragged, expected 4, found 6
Once calling it in following way, everything is ok:
List<List<Double>> embeddings = embeddingModel.embed(List.of("Hello world", "Hello world"));
Basically my conclusion is, that the embed method expects, that all strings will have the same number of tokens???!!! Particularly in this case, first sentence has 4 tokens and second 6 tokens... Is my understanding correct? Is this feasible assumption?
Once investigating the problem further, I found taht there is following check in TensorInfo.java"
/**
* Extracts the shape from a multidimensional array. Checks to see if the array is ragged or not.
*
* @param shape The shape array to write to.
* @param curDim The current dimension to check.
* @param obj The multidimensional array to inspect.
* @throws OrtException If the array has a zero dimension, or is ragged.
*/
private static void extractShape(long[] shape, int curDim, Object obj) throws OrtException {
if (shape.length != curDim) {
int curLength = Array.getLength(obj);
if (curLength == 0) {
throw new OrtException(
"Supplied array has a zero dimension at "
+ curDim
+ ", all dimensions must be positive");
} else if (shape[curDim] == 0L) {
shape[curDim] = curLength;
} else if (shape[curDim] != curLength) {
throw new OrtException(
"Supplied array is ragged, expected " + shape[curDim] + ", found " + curLength);
}
for (int i = 0; i < curLength; i++) {
extractShape(shape, curDim + 1, Array.get(obj, i));
}
}
}