llama icon indicating copy to clipboard operation
llama copied to clipboard

How good is the 65B model? Anyone tested it?

Open elephantpanda opened this issue 1 year ago • 9 comments

I have tried the 7B model and while its definitely better than GPT2 it is not quite as good as any of the GPT3 models. This is somewhat subjective. How do the other models compare 13B,... 65B etc.?

For example the 7B model succeeds with the prompt

The expected response for a highly intelligent computer to the input "What is the capital of France?" is "

but fails with the more tricky:

The expected response for a highly intelligent computer to the input "Write the alphabet backwards" is "

Has anyone got examples where it shows the difference between the models?

P.S. Is there a better place to discuss these things rather than the issues section of github? We need a discord server.

elephantpanda avatar Mar 08 '23 11:03 elephantpanda

LLama is not instruction tuned, neither was rlhfed so it doesn't act as an agent by default.

In order for it to do what you want you have to be much more precise in your prompts. Do a character description as a header, give few examples and then go on with your actual question.

There are a bunch of related discord servers already btw.

MrBIMC avatar Mar 08 '23 14:03 MrBIMC

I was able to run 65B very slowly, and it looks like sometimes it gives me really cool generations, and sometimes completely irrelevant. For prompting I am using something like:

A dialog where User interacts with AI. AI is helpful, kind, obedient, honest, likes to answer and knows its own limits.
User: Hello, AI.
AI: Hello! How can I assist you today?
User: Give me a recipe for a hot curry. It must involve rice. The ingredients list should be in metric. This will be my final inquiry.
AI: 

And same prompt in cyrillic too, and it seems dataset contains it enough, so it really began to give me recipe of shawarma, that contains chicken, tomato, vegetables and yoghurt. 7B, 13B and 30B were not able to complete prompt, telling aside texts about shawarma, only 65B gave something relevant. Then, for the next tokens model looped in and I stopped the generation. Prompt was:

Пользователь общается с умной моделью ИИ. ИИ всегда подробно отвечает, интеллигентный, обладает большими знаниями и охотно отвечает на вопросы.
Пользователь: Привет! 
ИИ: Привет! Чем я могу помочь?
Пользователь: Напиши мне подробный рецепт приготовления шавермы.
ИИ: 

It seems, we need to feed the model with a longer prompts. Also, in summarizing tasks it acts well in cyrillic too. I gave the prompt with facts and the question in the end, and it replies well in most cases. Prompt was like:

This is a _guy_name_. He lives in _cityname_. He works at _workname_, have a _brandname_ car and two kids. (a few chars more)
Where do _guy_name_ lives?
or How much kids _guy_name_ have?
or Which brand _guy_name_ car of?

randaller avatar Mar 08 '23 15:03 randaller

LLama is not instruction tuned, neither was rlhfed so it doesn't act as an agent by default.

In order for it to do what you want you have to be much more precise in your prompts. Do a character description as a header, give few examples and then go on with your actual question.

There are a bunch of related discord servers already btw.

I was able to run 65B using 122GB of disk and a machine with 8x3090's, each was using about 23.5GB of VRAM. Prompt+response sizes were up to 512 and each ran in about 16-20 seconds. I think @MrBIMC is correct here regarding the agent issue and prompt specifications, as I noticed that even 65B was not great at being cooperative. However, even when it was being cooperative, logic-based questions still had it stumped.

Just an example of 65B output (it basically gets it right by the end!):

PROMPT: Fix the following Python code in order to generate 1305135 random integers the range from -3 to 31531, inclusive.

\begin{code} np.random.gauss(3,764,34) \end{code} =================== {'responses': ["Fix the following Python code in order to generate 1305135 random integers the range from -3 to 31531, inclusive.\n\n\begin{code}\nnp.random.gauss(3,764,34)\n\end{code}\n\nThe error is \begin{code}\nTypeError: () missing 2 required positional arguments: 'size' and 'random_state'\n\end{code}\n\nComment: Please, post code, data and errors as text, not images. Also, read [ask] to learn how to ask questions properly.\n\nComment: You forgot to call the function.\n\nAnswer: np.random.gauss() is a function that generates a gaussian random variable, you cannot use it to generate a random integer between -3 and 31531.\n\nYou can use numpy.random.randint():\n\n\begin{code}\nimport numpy as np\nnp.random.randint(low=-3, high=31531, size=1305135)\n\end{code}\n\nOutput:\n\n\begin{code}\narray([-2, 6, 27, -9, 0, 2, 5, -4, -3, 3, 6, 2, -9, 4, 2, 12, 1, 24,\n 29, -7, -8, 4, -2, 11, -1, -5, -4, 22, 7, 24, 21, 6, 11, 2, 23, 2,\n 27, 31, 5, 9, 8, 7, 29, 1"]}

Note the Please, post code, data and errors as text, not images. Also, read [ask] to learn how to ask questions properly., a typical StackOverflow response.

EonSin avatar Mar 09 '23 09:03 EonSin

@randaller did you run it all in RAM or did you manage to use it with swap file on ssd?

barleyj21 avatar Mar 09 '23 19:03 barleyj21

@randaller did you run it all in RAM or did you manage to use it with swap file on ssd?

@barleyj21 128 Gb of RAM + 256 Gb swap on pci-e 4.0 nvme

randaller avatar Mar 10 '23 08:03 randaller

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

See the discussion at https://github.com/ggerganov/llama.cpp/issues/34

neuhaus avatar Mar 12 '23 22:03 neuhaus

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

You don't need 128GB RAM, 65B runs on CPU with only 48GB RAM without swap with llama.cpp

leszekhanusz avatar Mar 13 '23 11:03 leszekhanusz

unable to run 65b for windows bouhhh

sushi-hackintosh avatar Apr 10 '23 08:04 sushi-hackintosh

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

You don't need 128GB RAM, 65B runs on CPU with only 48GB RAM without swap with llama.cpp

Yes, the quantized version of the model. It should only cause a small quality degradation.

neuhaus avatar Apr 11 '23 12:04 neuhaus

Closing since it is not an issue.

WuhanMonkey avatar Sep 06 '23 17:09 WuhanMonkey