self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

Open Andy1996247 opened this issue 1 year ago • 4 comments

Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

Would this work? https://llava-vl.github.io/

https://simonwillison.net/2023/Nov/29/llamafile/

Originally posted by @Andy1996247 in https://github.com/OthersideAI/self-operating-computer/issues/86#issuecomment-1849063383

Andy1996247 avatar Dec 11 '23 05:12 Andy1996247

From what i manually tested its not better than gptV... i guess we have to wait for vision to improve and provide accurate coordinates https://github.com/OthersideAI/self-operating-computer/issues/7

or take a different approach.

BorisMolch avatar Dec 11 '23 18:12 BorisMolch

@BorisMolch even though Llava may not perform well, others may be interested to try it and see how they can improve it. If you want to make a PR for running Llava locally, I'd be happy to review it

joshbickett avatar Dec 12 '23 18:12 joshbickett

@BorisMolch even though Llava may not perform well, others may be interested to try it and see how they can improve it. If you want to make a PR for running Llava locally, I'd be happy to review it

"manually tested" as in gave Llava screenshots and tried to see if its capable to instructing. its not (as well as GPTV)

BorisMolch avatar Dec 12 '23 19:12 BorisMolch

oh ok, understood. Well thanks for the input nonetheless!

joshbickett avatar Dec 12 '23 19:12 joshbickett

We now have Llava integrated thanks to the PR from @michaelhhogue!

joshbickett avatar Feb 09 '24 04:02 joshbickett