Configurable image descriptor fn `get_image_description` and support for list type content in messages
🚀 The feature
Hey, Thanks for your work on mem0. I was wondering what is the vision towards increasing the support for multi-model input/message. At the moment I am running into a few issues, namely:
- List-based
contentcan not be added/indexed by the mem0 (with redis backing). An example of list based content is shown here:
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe the this image",
},
{
"type": "image_url",
"image_url": {
"url": "https://someimage_somewhere_jumbo.jpeg"
},
},
],
}
]
These fail with error TypeError: list indices must be integers or slices, not str but the entirely correct format for multi-modal OpenAI message format.
I can see that json/dict content is processed fine, however. i.e. the following is okay:
```json
[
{
"role": "user",
"content": {
"type": "text",
"text": "Describe the this image",
},
}, {
"role": "user",
"content": {
"type": "image_url",
"image_url": {
"url": "https://someimage_somewhere_jumbo.jpeg"
},
},
}
]
- The image descriptor method
get_image_descriptionassumes a call to OpenAI. Can we make this method configurable for base_url, api_key and model i.e., in general, BaseLlmConfig to point it to any LLM?
https://github.com/mem0ai/mem0/blob/f4dc5f6c718c92f03394cba438e897059860585a/mem0/memory/utils.py#L48
Motivation, pitch
To make mem0 more usable in a multi-modal setting where custom image descriptors can be more valuable for cost and domain fit purposes.
Thanks for opening the issue @suneeta-mall. We are happy to add support for it.
Hey @suneeta-mall I'm working on this issue, so can you please elaborate on why there is a need to pass the text field in the content key? Want to understand the use-case here. Thanks!
Hey @suneeta-mall I'm working on this issue, so can you please elaborate on why there is a need to pass the
textfield in thecontentkey? Want to understand the use-case here. Thanks!
Hey @Dev-Khant I am not quite sure if I am following your question. These message formats are OpenAI compatible/standard formats where content can be str or list of text and image payload. These are required for VQA questions.