gptme icon indicating copy to clipboard operation
gptme copied to clipboard

Add vision support

Open ErikBjare opened this issue 7 months ago • 0 comments

Since the OpenAI API now has vision in beta, and we could use LLaVa locally.

Might be a lot of work, or might be super easy.

Question is, what would it be useful for?

  • #51: Xvfb to understand display/output and make a E2E desktop agent
  • #52: Screenshot with browser tool
    • Can be used to take screenshots of developed webapps for visually-aided autodebugging
  • Have it review plot outputs for correctness and to inspect results
    • Could be useful for data science, but reading a good plain text output might still be superior

ErikBjare avatar Nov 28 '23 15:11 ErikBjare