self-operating-computer > Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

Open Andy1996247 opened this issue 1 year ago • 4 comments

Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

Would this work? https://llava-vl.github.io/

https://simonwillison.net/2023/Nov/29/llamafile/

Originally posted by @Andy1996247 in https://github.com/OthersideAI/self-operating-computer/issues/86#issuecomment-1849063383

Dec 11 '23 05:12 Andy1996247

From what i manually tested its not better than gptV... i guess we have to wait for vision to improve and provide accurate coordinates https://github.com/OthersideAI/self-operating-computer/issues/7

or take a different approach.

Dec 11 '23 18:12 BorisMolch

@BorisMolch even though Llava may not perform well, others may be interested to try it and see how they can improve it. If you want to make a PR for running Llava locally, I'd be happy to review it

Dec 12 '23 18:12 joshbickett

@BorisMolch even though Llava may not perform well, others may be interested to try it and see how they can improve it. If you want to make a PR for running Llava locally, I'd be happy to review it

"manually tested" as in gave Llava screenshots and tried to see if its capable to instructing. its not (as well as GPTV)

Dec 12 '23 19:12 BorisMolch

oh ok, understood. Well thanks for the input nonetheless!

Dec 12 '23 19:12 joshbickett

We now have Llava integrated thanks to the PR from @michaelhhogue!

Feb 09 '24 04:02 joshbickett

self-operating-computer self-operating-computer copied to clipboard

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

self-operating-computer
self-operating-computer copied to clipboard