Evaluation on computer vision benchmarks

Open finitearth opened this issue 3 years ago • 2 comments

Are there plans to evaluate the vision modality of GPT-4? I am interested to know how GPT-4 could perform on classification tasks with 0- and few-shot-learning and how it compares to vision-only models. If the few-shot-learning capabilities of LLMs translate to other modalities, this would be a real game changer.

Question out of curiosity: How was the vision-modality incorperated? Maybe similar approaches can be taken for other modalities, such as audio or video? Would be an interessting Open-Source project for sure :)

Mar 16 '23 10:03 finitearth

I have an engineering exam bank of about 1000 questions with simple illustrations. I have the questions already in JSONL format but some of them rely on the image to answer correctly.

Apr 05 '23 04:04 MoreTore

Currently our API doesn't support vision, but if it does we'll definitely add support for that to this framework!

Apr 13 '23 18:04 jwang47