Flowise icon indicating copy to clipboard operation
Flowise copied to clipboard

FEATURE: Add Multi Modal Capabilities to Flowise

Open vinodkiran opened this issue 1 year ago • 13 comments

vinodkiran avatar Dec 21 '23 04:12 vinodkiran

Update after touchup: image

image

HenryHengZJ avatar Jan 17 '24 00:01 HenryHengZJ

Hi, it seems like only GPT-4 Vision will be added in this pull. Will Ollama+LLaVA be added in the future? Ollama itself already supports LLaVA, but the chain does not. I wish this can be added along.

treesheeptw avatar Jan 23 '24 07:01 treesheeptw

Hi, it seems like only GPT-4 Vision will be added in this pull. Will Ollama+LLaVA be added in the future? Ollama itself already supports LLaVA, but the chain does not. I wish this can be added along.

we'll first roll out chatopenai first, and move on to ollama

HenryHengZJ avatar Jan 24 '24 15:01 HenryHengZJ

@vinodkiran pushed a couple of UI fixes:

  • error message when audio recording is not supported
  • messages not autoscrolling to the bottom

0xi4o avatar Jan 30 '24 10:01 0xi4o

@vinodkiran @HenryHengZJ Made a couple of changes:

  • Change the UI for Speech to text configuration
  • Made messages in view message dialog consistent with internal chat

0xi4o avatar Feb 19 '24 10:02 0xi4o

Couple more updates:

  • Removed the status indicator in speech to text dialog
  • When submitting audio inputs, user messages will be updated (in the frontend) with the transcribed question using the selected speech to text provider. This was already available when there was only audio input but it only showed on refresh or when closing and opening the chat window. It will now show immediately after getting the response from the backend and it will now work even with multiple uploads (like images w/ audio).

0xi4o avatar Feb 19 '24 14:02 0xi4o

related chat embedded PR

HenryHengZJ avatar Feb 21 '24 18:02 HenryHengZJ

@HenryHengZJ @chungyau97 Issues have all been fixed. We can review and merge this. I think everything's good to go on @vinodkiran's end too.

0xi4o avatar Feb 22 '24 10:02 0xi4o

@0xi4o, thanks for the fix for invalid characters in file 2024-02-21 20_43_24-Elon Musk - Elon Musk.pdf — Mozilla Firefox.png image

chungyau97 avatar Feb 24 '24 04:02 chungyau97

Another error where importing flows then turn on Speech To Text will remove file and mic logo. image

Steps to reproduce error:

  1. Click Add New
  2. Import MultiModal chatflow
  3. Save chatflow
  4. Turn on Speech To Text

chungyau97 avatar Feb 24 '24 04:02 chungyau97

Another issue:

1.) Open OpenAI Whisper, put in credential: image

2.) switch to assembly ai, you can see the openai credential there: image

I think that's because of the credentialNames in the useEffect, I've removed that as it caused infinite loop, but you were trying to put it there to prevent this scenario right

HenryHengZJ avatar Feb 24 '24 05:02 HenryHengZJ

Another error where importing flows then turn on Speech To Text will remove file and mic logo. image

Steps to reproduce error:

  1. Click Add New
  2. Import MultiModal chatflow
  3. Save chatflow
  4. Turn on Speech To Text

solved

HenryHengZJ avatar Feb 24 '24 07:02 HenryHengZJ

Another issue:

1.) Open OpenAI Whisper, put in credential: image

2.) switch to assembly ai, you can see the openai credential there: image

I think that's because of the credentialNames in the useEffect, I've removed that as it caused infinite loop, but you were trying to put it there to prevent this scenario right

solved

HenryHengZJ avatar Feb 24 '24 07:02 HenryHengZJ

Could there be an option to configure another audio or image model (selfhosted)?

HermesMacedo avatar Feb 28 '24 17:02 HermesMacedo

Awesome feature. It would be great to add it to the chat embed as well.

nitromir avatar Mar 06 '24 14:03 nitromir