self-operating-computer
self-operating-computer copied to clipboard
Added support for more ollama models
What does this PR do?
This PR adds support for any multimodal model on Ollama by asking the user what model they would like to use after operate -m ollama. Also refactors some code in a non breaking way (partly due to my IDE automatically doing it). It also adds some rudimentary OCR support for models from Ollama. Previously, llava might as well be clicking and typing random things.
Also adds a --browser flag to pass onto the prompts given to the model if the user wanted to use a different browser
Requirement/Documentation
I made this feature request a few weeks ago.
Type of change
- [x] New feature (non-breaking change which adds functionality)
- [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] This change requires a documentation update (already done in README.md)
Mandatory Tasks
- [x] Make sure you have self-reviewed the code. A decent size PR without self-review might be rejected. I haven't run a test using evaluate.py since I don't have an OpenAI API key and the cost to unlock gpt-4o is a bit high for me. It would be really nice if someone else could do it!
Pretty new to programming, so please excuse some bad practices I might've followed.
Duplicate PR.