ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[BUG] Generating embeddings for arrays of objects is broken starting 2.17

Open Zhangxunmt opened this issue 8 months ago • 3 comments

What is the bug? Following this tutorial https://opensearch.org/docs/2.17/ml-commons-plugin/tutorials/generate-embeddings/, you will get the results like below. The title_embedding is not correctly ingested into the books as the new field. This results do not match what's in the tutorial.

{
  "docs": [
    {
      "doc": {
        "_index": "my_books",
        "_id": "1",
        "_source": {
          "_ingest": {
            "_value": {
              "title_embedding": [
                0.009794682,
                0.04060341,
                0.016146386,
                ...
                -0.03778624
              ]
            }
          },
          "books": [
            {
              "title": "first book",
              "description": "This is first book"
            },
            {
              "title": "second book",
              "description": "This is second book"
            }
          ]
        },
        "_ingest": {
          "_value": null,
          "timestamp": "2025-03-14T22:02:43.240620757Z"
        }
      }
    }
  ]
}

How can one reproduce the bug? Follow the tutorial, and you will duplicate the error.

What is the expected behavior? The results in the tutorial is expected - new embeddings are ingested into the Books.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

Zhangxunmt avatar Mar 19 '25 17:03 Zhangxunmt