extended_openai_conversation
extended_openai_conversation copied to clipboard
How to upgrade to gpt-4o
Hello What I have in mind is that gpt can analyze the images of the cameras connected to the home assistant for example : How many people do you see on the camera? Or what is the color of their clothes? Do they look suspicious?
In the first step, I tried to set its language model to gpt-4o in the extended open ai conversion settings as a result : The response speed is relatively better But when I asked him to analyze the camera images, he replied that I don't have access to cameras or that I don't have the ability to process images.
After a little searching, I found this: https://community.home-assistant.io/t/gpt-4o-vision-capabilities-in-home-assistant/729241 I installed it and after 1 day I succeeded! in such a way that When I say open ai conversion "what do you see?" 1- My automation or script is executed 2- A photo is taken from the camera I specified 3-Then I send that photo to ha-gpt4vision 4- The response of ha-gpt4vision is converted to sound with tts
If I'm honest, the result is good. lol:) But its problems are many For example, it is very limited Or sometimes its tts sound interferes with openai conversion (tts sounds are played at the same time)
Or I have to write a lot of scripts to run ha-gpt4vision (for example, if the word x is said, take a picture and analyze the picture. If the word b is said, take a picture and say what it is used for. If the word c is said, take a picture and tell if the person in the picture is suspicious or not. In this way, you have to write a lot of scripts to analyze each different photo
I'm looking for a way to not write scripts For example, extended open ai conversion can directly access the cameras, and when we say for example, what do you see in the camera? Analyze the camera image in real time with GPT-4O
In the end, I hope I have explained correctly and you understand because I used Google translator ♥