spring-ai icon indicating copy to clipboard operation
spring-ai copied to clipboard

Not able to call Embedding API with different sentence length in List

Open JirHr opened this issue 1 year ago • 3 comments
trafficstars

I am trying to use embedding API with this code:

List<List<Double>> embeddings = embeddingModel.embed(List.of("Hello world", "How are you?"));

Getting: ai.onnxruntime.OrtException: Supplied array is ragged, expected 4, found 6

Once calling it in following way, everything is ok: List<List<Double>> embeddings = embeddingModel.embed(List.of("Hello world", "Hello world"));

Basically my conclusion is, that the embed method expects, that all strings will have the same number of tokens???!!! Particularly in this case, first sentence has 4 tokens and second 6 tokens... Is my understanding correct? Is this feasible assumption?

Once investigating the problem further, I found taht there is following check in TensorInfo.java"

  /**
   * Extracts the shape from a multidimensional array. Checks to see if the array is ragged or not.
   *
   * @param shape The shape array to write to.
   * @param curDim The current dimension to check.
   * @param obj The multidimensional array to inspect.
   * @throws OrtException If the array has a zero dimension, or is ragged.
   */
  private static void extractShape(long[] shape, int curDim, Object obj) throws OrtException {
    if (shape.length != curDim) {
      int curLength = Array.getLength(obj);
      if (curLength == 0) {
        throw new OrtException(
            "Supplied array has a zero dimension at "
                + curDim
                + ", all dimensions must be positive");
      } else if (shape[curDim] == 0L) {
        shape[curDim] = curLength;
      } else if (shape[curDim] != curLength) {
        throw new OrtException(
            "Supplied array is ragged, expected " + shape[curDim] + ", found " + curLength);
      }
      for (int i = 0; i < curLength; i++) {
        extractShape(shape, curDim + 1, Array.get(obj, i));
      }
    }
  }

JirHr avatar Jul 29 '24 08:07 JirHr