alpaca-electron icon indicating copy to clipboard operation
alpaca-electron copied to clipboard

7B 13B 30B Comparisons

Open enzyme69 opened this issue 1 year ago • 5 comments

I am testing a few models on my machine, M2 Mac.

At first, I tried 13B, slightly slow, but it's not bad, 5-7 words per seconds. The answers are pretty good actually. It's not yet ChatGPT, as I could not get proper answer on Blender Python. But it's pretty good at general Q&A.

I thought 7GB would be faster, but somewhat the AI responses and answers are disappointing. I delete it right away.

30 GB... is a bit too slow for this machine.

I wonder how we can refine a model, make it run faster and more precise on topic?

enzyme69 avatar Apr 11 '23 13:04 enzyme69

"how we can refine a model" - i think this depends on the model that you're using - not a program. Can author refine all models? not sure.
"precise on topic" will be only if parameters like "temp" "top-p" will be added to control in this program (like in other UI text-generation tools). As example if you'll run your models through console services - you'll have ability to control base params like this.

DogVanDog avatar Apr 11 '23 13:04 DogVanDog

Hmm... yes, I need to investigate the model more, but I am pretty happy with 13B. It seems pretty smart, just under ChatGPT.

Screenshot 2023-04-11 at 11 54 20 pm

However, this 7B model is definitely broken: https://huggingface.co/Pi3141/alpaca-lora-7B-ggml/blob/main/ggml-model-q4_1.bin https://huggingface.co/Pi3141/alpaca-lora-7B-ggml/resolve/main/ggml-model-q4_1.bin

enzyme69 avatar Apr 11 '23 13:04 enzyme69

Is there a recommendation on Llama or Alpaca model that's the most creative / better for coding?

enzyme69 avatar Apr 12 '23 09:04 enzyme69

Screenshot 2023-04-12 at 10 06 20 pm

In many Occassions, if we ask questions with the same beginning of sentences, it will repeat the answers, without even thinking. It's a bug.

enzyme69 avatar Apr 12 '23 12:04 enzyme69

Are there any compatible models other than the basic 7b/13b/30b from here?

https://huggingface.co/Pi3141

I've got 30b running on my 5900x/64GB RAM desktop and it's actually pretty useable - maybe 2-3 words per second. I wasn't sure how well that would work.

Curious if there is anything else I can try out. I don't know that much about LLMs but most don't seem to be in this ggml/.bin format. I searched GGML on HuggingFace but none of them (at least that I'm interested in) seem to work. Assume they're the "old format" the model loader references.

EDIT: Actually I've found one that works : https://huggingface.co/verymuchawful/Alpacino-13b-ggml

mlbrnm avatar Apr 17 '23 13:04 mlbrnm