haystack feat: Multimodal ChatMessage

Related Issues

fixes #7848

Proposed Changes:

As suggested, I added the capability for ChatMessage to store str, ByteStream and a list of both, this way you can store any content type you might want to use on a chat environment, as images or text. For now we only support text, image urls and images in base 64 as this are the original requested types, more types could be implemented if needed. I also added serialization and updated ChatPromptBuilder to this new ChatMessage. The ByteStream class was updated with a method to populate mime_type more effectively.

How did you test it?

I added unit tests for all new functionality added.

Notes for the reviewer

I believe this works fine, but maybe it will be hard to explain this new functionality in the docs.

Checklist

I have read the contributors guidelines and the code of conduct ✅
I have updated the related issue with new insights and changes✅
I added unit tests and updated the docstrings ✅
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
I documented my code ❌
I ran pre-commit hooks and fixed any issue ✅

Jun 26 '24 22:06 CarlosFerLo

@silvanocerza let me know if this was what you had in mind for this feature.

Jun 28 '24 22:06 CarlosFerLo

Pull Request Test Coverage Report for Build 9719406674

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%
<!--	Total:	54

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

Jun 28 '24 23:06 coveralls

Pull Request Test Coverage Report for Build 9719403728

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%
<!--	Total:	54

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

Jun 28 '24 23:06 coveralls

Pull Request Test Coverage Report for Build 9724444663

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%
<!--	Total:	54

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

Jun 29 '24 13:06 coveralls

Pull Request Test Coverage Report for Build 9724642783

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%
<!--	Total:	54

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

Jun 29 '24 14:06 coveralls

I do not know how to fix all the mypy errors, it just messes around on typing when the same variable name is used in two different iterations of a for loop as two different types. The usage is safe by the way.

Jun 29 '24 20:06 CarlosFerLo

mypy seems to be really sensitive to working with different types and using isInstance. I resolved my mypy errors by explicitly casting so that mypy is now confident of the type.

This applies to attr-defined and union-attr.

Jul 02 '24 03:07 lbux

Hey all, just wondering if this PR is going ahead? This is needed badly as things move to multi-modality.

Jul 23 '24 07:07 michaeltremeer

@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does.

Jul 23 '24 10:07 CarlosFerLo

@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does.

No worries Carlos, I love your work in getting this PR done. As an aside, I've been getting acquainted with the library and while this appears to be better suited for multi-modal pipelines than griptape and others, it still seems like it's still quite text-centric and a little hard to work with when you want to weave text, image, audio, and even dataframe/JSON data together. I think your work here is a great start but I do wonder if some of the assumptions of many of the components make sense (e.g. that Documents are generally assumed be text data, along with a lack of tools for converting non-text Documents or Bytestream objects to chat messages or into prompt templates). It's definitely an area that could be prioritised to make things easier to extend the library in future.

Jul 23 '24 11:07 michaeltremeer

@michaeltremeer

Thanks for your suggestions, I will try and implement the 'png' thing this evening. Anything you might need implemented, just say it and I will do my best.

Regarding the low support for non text types, I completely agree with you, I am looking for a way to introduce object support. We could talk about how to add this support for more complex types, and I believe we can accomplish it.

Jul 23 '24 12:07 CarlosFerLo

Pull Request Test Coverage Report for Build 10074443031

Details

0 of 0 changed or added relevant lines in 0 files are covered.
19 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.04%) to 90.084%

Files with Coverage Reduction	New Missed Lines	%
components/builders/answer_builder.py	1	98.31%
dataclasses/chat_message.py	6	95.8%
components/builders/chat_prompt_builder.py	12	88.07%
<!--	Total:	19

Totals
Change from base Build 10041545511:	-0.04%
Covered Lines:	6995
Relevant Lines:	7765

💛 - Coveralls

Jul 23 '24 21:07 coveralls

@michaeltremeer I thought about it and I do not know why I didn't implement it this way :) I have added all the file encoding processing directly on the ChatMessage code as it is short and simple and abstracting it seems to be an overkill. But if you want I con do so, it is simple.

Jul 24 '24 09:07 CarlosFerLo

Hi guys, this is a very useful PR. Thanks for that. What is the status of it? Could this be reviewed and merged soon?

Aug 30 '24 05:08 jkondek1

Hi @jkondek1, thanks for the support, I do not know if they are going to implement it or not, I am willing to work with them to resolve all conflicts if they want.

Sep 04 '24 11:09 CarlosFerLo

Any updates on if this PR could get approved + merged? This is super useful for us.

Also curious if there are any other workarounds to use openai's or claude's image type/multimodal chat messages.

Sep 16 '24 22:09 anishpdoshi

Closing, this is not the direction we want to go with multi modal chat messages.

If you still want to contribute this feature feel free to do following my suggestion from https://github.com/deepset-ai/haystack/issues/7848#issuecomment-2178119469.

Sep 24 '24 14:09 silvanocerza

@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator?

Sep 27 '24 20:09 anishpdoshi

@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator?

Hi Anish, I am not sure if it helps you, but we needed to pass an image to the generator.generate() method and found out that you can put the whole "content" list (as it would appear if you were using f.e. openai client) to the method.

 [
        {
            "type": "text",
            "text": "I want to know more about this image"
        },
        {
            "type": "image_url",
            "image_url":
                {
                    "url": "data:image/jpeg;base64," + "base64_encoded_image"
                }
        }
 ]

For OpenAI reference, see https://platform.openai.com/docs/guides/text-generation/building-prompts

Oct 04 '24 15:10 jkondek1