self-operating-computer
self-operating-computer copied to clipboard
> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think
Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think
Would this work? https://llava-vl.github.io/
https://simonwillison.net/2023/Nov/29/llamafile/
Originally posted by @Andy1996247 in https://github.com/OthersideAI/self-operating-computer/issues/86#issuecomment-1849063383
From what i manually tested its not better than gptV... i guess we have to wait for vision to improve and provide accurate coordinates https://github.com/OthersideAI/self-operating-computer/issues/7
or take a different approach.
@BorisMolch even though Llava may not perform well, others may be interested to try it and see how they can improve it. If you want to make a PR for running Llava locally, I'd be happy to review it
@BorisMolch even though Llava may not perform well, others may be interested to try it and see how they can improve it. If you want to make a PR for running Llava locally, I'd be happy to review it
"manually tested" as in gave Llava screenshots and tried to see if its capable to instructing. its not (as well as GPTV)
oh ok, understood. Well thanks for the input nonetheless!
We now have Llava integrated thanks to the PR from @michaelhhogue!