evals ChatGPT API with model gpt-4 is not using GPT4. It's completely different from CHATGPT PLUS GPT4

Not right place for this issue but this is more important than any issue in any repo you have because it's literally lying. The api request result says it uses GPT-3 when I query this

ozgur@Ozgurs-MacBook-Pro ~ % curl https://api.openai.com/v1/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer [REDACTED]"
-d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "What is your model number!"}] }' {"id":"chatcmpl-6wVWzn95O6iDc1Z9W1kJNIZZtZZLh","object":"chat.completion","created":1679402249,"model":"gpt-4-0314","usage":{"prompt_tokens":12,"completion_tokens":45,"total_tokens":57},"choices":[{"message":{"role":"assistant","content":"As an AI language model, I do not have a model number like a physical device or product would. I am powered by OpenAI's GPT-3, which stands for Generative Pre-trained Transformer 3."},"finish_reason":"stop","index":0}]}

vs CHATGPT PLUS GPT4 says Screenshot 2023-03-21 at 13 08 56

Mar 21 '23 13:03 ozgurozkan123

Better comparison would be the OpenAI Playground, since ChatGPT (Plus) is pre-fed some system context we don't know.

Mar 21 '23 13:03 machinekoder

I tried it on openAI playground as well. Screenshot 2023-03-21 at 13 48 34

Mar 21 '23 13:03 ozgurozkan123

But it's charged under GPT4 on the account Screenshot 2023-03-21 at 13 50 15

Mar 21 '23 13:03 ozgurozkan123

and the training data cut off date it says is 2020 (same with GPT-3) not 2021 Screenshot 2023-03-21 at 13 53 25

( Chatgpt plus says . Screenshot 2023-03-21 at 13 52 02

Mar 21 '23 13:03 ozgurozkan123

I believe ChatGPT with model gpt-4 (within the browser) is using GPT-4 with additional context.

I also believe the Chat API (not within the browser) with model gpt-4 (like in the playground) is also using GPT-4.

In short: It's incorrect that it reports using GPT-3 when it really uses some version of GPT-4.

My justification is that I am running a bunch of evals and gpt-4 outperforms gpt-3.5-turbo.

And also my experience in using ChatGPT in the browser: The GPT-4 configuration outperforms the GPT-3 configuration and outputs better (more correct) responses.

Mar 21 '23 15:03 jonathanagustin

My only concern and question mark is training data cut off date. I know as an end user when I install or use a consumer facing app that claims gpt4 and answers as gpt3 is a UX problem.

Mar 21 '23 20:03 ozgurozkan123

@ozgurozkan123

The issue appears to be: When GPT-4 is configured, is GPT-4 used when the output says it is not GPT-4?

I think the best analogy I can make is: If I trained a parrot to bark like a dog, it does not mean that it is not a parrot.

If the configuration is set to GPT-4, GPT-4 is used even though the output states that it is not GPT-4.

For example, in your query:

ozgur@Ozgurs-MacBook-Pro ~ % curl https://api.openai.com/v1/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer [REDACTED]"
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "What is your model number!"}]
}'
{"id":"chatcmpl-6wVWzn95O6iDc1Z9W1kJNIZZtZZLh","object":"chat.completion","created":1679402249,"model":"gpt-4-0314","usage":{"prompt_tokens":12,"completion_tokens":45,"total_tokens":57},"choices":[{"message":{"role":"assistant","content":"As an AI language model, I do not have a model number like a physical device or product would. I am powered by OpenAI's GPT-3, which stands for Generative Pre-trained Transformer 3."},"finish_reason":"stop","index":0}]}

the query is set to use gpt-4:

"model": "gpt-4"

and the prompt is:

"What is your model number!"

and the output is:

"I am powered by OpenAI's GPT-3, which stands for Generative Pre-trained Transformer 3."

Here, there appears to be a factual error: The output says it's powered by GPT-3 but the fact is that it's powered by GPT-4. Although the output is inaccurate, it is still GPT-4. A model can be wrong about self-identifying questions about itself, but it does not change what model is being used. In the GPT-4 Technical Report, OpenAI states:

Here, OpenAI acknowledges that GPT-4 "is not fully reliable" and "makes reasoning errors". In this case, it made a reasoning error about what model it is.

OpenAI does not make any guarantees or promises about accuracy. When you use ChatGPT or OpenAI's API, you agree to OpenAI's Terms of use:

By using our Services, you agree to these Terms.

In Section 7(b) of the Terms of use, the Disclaimer states that they do not warrant that their services will be accurate or error free:

(b) Disclaimer. THE SERVICES ARE PROVIDED “AS IS.” EXCEPT TO THE EXTENT PROHIBITED BY LAW, WE AND OUR AFFILIATES AND LICENSORS MAKE NO WARRANTIES (EXPRESS, IMPLIED, STATUTORY OR OTHERWISE) WITH RESPECT TO THE SERVICES, AND DISCLAIM ALL WARRANTIES INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, SATISFACTORY QUALITY, NON-INFRINGEMENT, AND QUIET ENJOYMENT, AND ANY WARRANTIES ARISING OUT OF ANY COURSE OF DEALING OR TRADE USAGE. WE DO NOT WARRANT THAT THE SERVICES WILL BE UNINTERRUPTED, ACCURATE OR ERROR FREE, OR THAT ANY CONTENT WILL BE SECURE OR NOT LOST OR ALTERED.

Mar 22 '23 03:03 jonathanagustin

Here, OpenAI acknowledges that GPT-4 "is not fully reliable" and "makes reasoning errors". In this case, it made a reasoning error about what model it is.

Might be an interesting eval to verify if a model is capable of knowing about itself without additional context.

Mar 22 '23 09:03 machinekoder

You would think when the platform was written that it would include the ID for the updated iteration. Not so much self aware but embedded in the program.

Mar 24 '23 04:03 Tkinfo11

As mentioned by @jonathanagustin, our models tend to hallucinate facts and this case was most likely an instance of hallucination. However, it may be possible (though the issue you noticed doesn't point to it) that the UI is actually wrong! If you suspect that might be the case, please contact our support team.

Mar 28 '23 18:03 jwang47

evals evals copied to clipboard

ChatGPT API with model gpt-4 is not using GPT4. It's completely different from CHATGPT PLUS GPT4

evals
evals copied to clipboard