Flowise
Flowise copied to clipboard
FEATURE: Add Multi Modal Capabilities to Flowise
Update after touchup:
Hi, it seems like only GPT-4 Vision will be added in this pull. Will Ollama+LLaVA be added in the future? Ollama itself already supports LLaVA, but the chain does not. I wish this can be added along.
Hi, it seems like only GPT-4 Vision will be added in this pull. Will Ollama+LLaVA be added in the future? Ollama itself already supports LLaVA, but the chain does not. I wish this can be added along.
we'll first roll out chatopenai first, and move on to ollama
@vinodkiran pushed a couple of UI fixes:
- error message when audio recording is not supported
- messages not autoscrolling to the bottom
@vinodkiran @HenryHengZJ Made a couple of changes:
- Change the UI for Speech to text configuration
- Made messages in view message dialog consistent with internal chat
Couple more updates:
- Removed the status indicator in speech to text dialog
- When submitting audio inputs, user messages will be updated (in the frontend) with the transcribed question using the selected speech to text provider. This was already available when there was only audio input but it only showed on refresh or when closing and opening the chat window. It will now show immediately after getting the response from the backend and it will now work even with multiple uploads (like images w/ audio).
related chat embedded PR
@HenryHengZJ @chungyau97 Issues have all been fixed. We can review and merge this. I think everything's good to go on @vinodkiran's end too.
@0xi4o, thanks for the fix for invalid characters in file 2024-02-21 20_43_24-Elon Musk - Elon Musk.pdf — Mozilla Firefox.png
Another error where importing flows then turn on Speech To Text
will remove file and mic logo.
Steps to reproduce error:
- Click
Add New
- Import
MultiModal chatflow
- Save
chatflow
- Turn on
Speech To Text
Another issue:
1.) Open OpenAI Whisper, put in credential:
2.) switch to assembly ai, you can see the openai credential there:
I think that's because of the credentialNames
in the useEffect
, I've removed that as it caused infinite loop, but you were trying to put it there to prevent this scenario right
Another error where importing flows then turn on
Speech To Text
will remove file and mic logo.Steps to reproduce error:
- Click
Add New
- Import
MultiModal chatflow
- Save
chatflow
- Turn on
Speech To Text
solved
Another issue:
1.) Open OpenAI Whisper, put in credential:
2.) switch to assembly ai, you can see the openai credential there:
I think that's because of the
credentialNames
in theuseEffect
, I've removed that as it caused infinite loop, but you were trying to put it there to prevent this scenario right
solved
Could there be an option to configure another audio or image model (selfhosted)?
Awesome feature. It would be great to add it to the chat embed as well.