mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

Configurable image descriptor fn `get_image_description` and support for list type content in messages

Open suneeta-mall opened this issue 9 months ago • 3 comments

🚀 The feature

Hey, Thanks for your work on mem0. I was wondering what is the vision towards increasing the support for multi-model input/message. At the moment I am running into a few issues, namely:

  1. List-based content can not be added/indexed by the mem0 (with redis backing). An example of list based content is shown here:
    [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the this image",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://someimage_somewhere_jumbo.jpeg"
                    },
                },
            ],
        }
    ]

These fail with error TypeError: list indices must be integers or slices, not str but the entirely correct format for multi-modal OpenAI message format.

I can see that json/dict content is processed fine, however. i.e. the following is okay:

```json
    [
        {
            "role": "user",
            "content": {
                    "type": "text",
                    "text": "Describe the this image",
                },
 }, {
            "role": "user",
            "content": {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://someimage_somewhere_jumbo.jpeg"
                    },
                },
        }
    ]
  1. The image descriptor method get_image_description assumes a call to OpenAI. Can we make this method configurable for base_url, api_key and model i.e., in general, BaseLlmConfig to point it to any LLM?
    https://github.com/mem0ai/mem0/blob/f4dc5f6c718c92f03394cba438e897059860585a/mem0/memory/utils.py#L48

Motivation, pitch

To make mem0 more usable in a multi-modal setting where custom image descriptors can be more valuable for cost and domain fit purposes.

suneeta-mall avatar Mar 03 '25 04:03 suneeta-mall

Thanks for opening the issue @suneeta-mall. We are happy to add support for it.

deshraj avatar Mar 04 '25 07:03 deshraj

Hey @suneeta-mall I'm working on this issue, so can you please elaborate on why there is a need to pass the text field in the content key? Want to understand the use-case here. Thanks!

Dev-Khant avatar Mar 04 '25 18:03 Dev-Khant

Hey @suneeta-mall I'm working on this issue, so can you please elaborate on why there is a need to pass the text field in the content key? Want to understand the use-case here. Thanks!

Hey @Dev-Khant I am not quite sure if I am following your question. These message formats are OpenAI compatible/standard formats where content can be str or list of text and image payload. These are required for VQA questions.

suneeta-mall avatar Mar 07 '25 20:03 suneeta-mall