FastChat
FastChat copied to clipboard
Any plans for 30B version?
Any plan for fine-tuning a 30B version? Either directly fine-tuning it or with Lora?
yes, @ZYHowell is looking into this. But we need to first investigate if 30B w/ lora can improve the chatbot performance compared to 13B w/o lora, otherwise, it does not make sense to spend compute to train a worse bot.
I guess we should see improvement. @zhisbug
Here is what I get from alpaca 30b with lora (4bit)

And this is what I get currently using Vicuna 13B 4bit

Apparently the 30b can do so much better with logical deductions
Both are 1 shot answer
@davidliudev is this MetaIX one aka Chansung's 30B Alpaca Lora + GPTQ? @zhisbug I just read from where that comparing (30B4b vs 13B16b?) that you almost should go higher parameter count than lower parameter count with higher precision with same RAM requirement.
I guess should direct fine-tune LLaMA to get base model. Then test to see if LoRa can further improve things. Then GPTQ. ML magic╰(°▽°)╯
@Originalimoc No I used this one https://huggingface.co/elinas/alpaca-30b-lora-int4 which is based on this lora https://huggingface.co/baseten/alpaca-30b Loaded via Text Generation WebUI and GPTQ
Can I please have the training data so I can finetune my own vicuna 30B and 65B
Can I please have the training data so I can finetune my own vicuna 30B and 65B
Second this, we would like to try the same :-)
@davidliudev, I just tried https://huggingface.co/elinas/alpaca-30b-lora-int4 with alpaca-30b-4bit-128g.safetensors
This is what I get:

And similar hallucinations with vicuna.
Could you share your parameters please?
Edit: Unchecking "do_sample" helps get rid of the hallucinations, but alpaca-30b-4bit-128g is still not good at logic:

And this is what I get currently using Vicuna 13B 4bit
I'm trying Vicuna 13b with 8-bit quantization, and got the answers below using the FastChat CLI, making me think that the Vicuna 13b model seems to be able to handle some deductive reasoning too. The answers I get seem a lot more verbose than others posted above:
Human: If I have a basket of potatoes and a box of cake on top of it, how do I get 1 potato?
Assistant: If you have a basket of potatoes and a box of cake on top of it, and you want to get one potato, you can follow these steps:
- First, carefully remove the cake from the basket and place it on a surface nearby.
- Then, reach into the basket and carefully remove one potato, being mindful not to disturb the other potatoes in the basket.
- If you want to make sure you get the potato you want, you can take the time to identify it before you remove it from the basket.
- Once you have the potato in your hand, you can either return the other potatoes to the basket or place them somewhere else where they will not be disturbed.
- Finally, you can either eat the potato, store it for later use, or use it in a recipe.
... and then the follow-up question asking for rationale behind it's sequencing:
Human: Why do I need to remove the cake first?
Assistant: You don't necessarily need to remove the cake from the basket before getting a potato, but it may be necessary if you want to get a specific potato or if you want to avoid damaging the potatoes or the cake. Here are a few reasons why you might want to remove the cake before getting a potato:
- If you want to get a specific potato: If you know which potato you want to get, you might want to remove any other potatoes or objects that are in the way so that you can easily reach the one you want.
- To avoid damaging the potatoes or the cake: If you are not careful when reaching into the basket, you could accidentally damage the potatoes or the cake. By removing the cake first, you can avoid accidentally squishing it or dislodging any potatoes while you are trying to get the one you want.
- To keep the basket organized: If you want to keep the basket organized, you might want to remove any objects that are not potatoes before getting the one you want. This can help you to easily find the potato you want and prevent any confusion or frustration.
Watching this in case any developments regarding fine tuning 65B arise...
@davidliudev, I think it's still possible to do logical reasoning with the current model size. This is what I get with 4-bit model from here: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit using llama.cpp.
> If I have a basket of potatoes and a box of cake on top of it, how do I get 1 potato?\
To get one potato from the basket with a box of cake on top, you would need to remove the box of cake first. This can be done by carefully lifting the box off the potatoes using your hands or a tool, such as a spatula or a fork. Once the box is removed, you can then access the potatoes and remove one from the basket.
### Human:
> Why can't I get one directly?
It is not possible to get one potato directly from the basket with a box of cake on top without first removing the box of cake. This is because the box is blocking access to the potatoes, and you would need to move or remove the box in order to reach the potatoes. It is not safe or feasible to try to get one potato directly from the basket while it is still covered by the box.
Thanks for inputs from other testings. I am not using scientific method to do the comparison so expect this to be a pretty rough test and I may be wrong. Maybe someone can do a more scientific test, or we can just wait for a true 30b version and see how it performs.
From my side the 30b alpaca seems to be far more stable and have a higher chance to give good result. On the other hand vicuna is better at following instructions but occasionally give hallucination.
My parameter: max_new_tokens: 200 Prompt size in tokens: 2048
temperature: 0.95 repetition penalty: 1.1 top_p: 1 top_k: 40
I sometimes get good result from vicuna for tricky questions, but for my setup, it is not 1-shot, meaning I need to click regenerate a few times to make it good for logical deduction questions.
Some more tests using above parameters:
Alpaca 30b:

Vicuna 13b (this time 8 bit)

Here's what my local 4-bit model printed:
> If my Bluetooth earphone is broken, shall I see otologist or dentist?
If your Bluetooth earphone is broken, you should seek help from an electronic repair technician or a specialist who can fix or diagnose problems with electronic devices. An otolaryngologist, also known as an ear, nose, and throat (ENT) specialist, is a medical doctor who treats conditions related to the ear, nose, and throat, but they would not be able to help you with a broken Bluetooth earphone. A dentist, on the other hand, would not be able to help with a broken Bluetooth earphone either.
not as good and neat as 30B-alpaca, but there is still good logical reasoning.
and here are my parameters:
main -i --color --interactive-first -t 8 -b 256 --top_k 12 --top_p 1 --temp 0.36 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.05 --instruct -m models\vicuna\ggml-vicuna-13b-4bit-rev1.bin -r "### Human:" -f prompts\vicuna.txt
sampling: temp = 0.360000, top_k = 12, top_p = 1.000000, repeat_last_n = 64, repeat_penalty = 1.050000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 97
main: seed = 1681297316
llama.cpp: loading model from models\vicuna\ggml-vicuna-13b-4bit-rev1.bin
llama_model_load_internal: n_vocab = 32001
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 1600.00 MB
system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Human:'
Interesting, this is what I get with vicuna 13b 4b (llama.cpp):
> If I have a basket of potatoes and a box of cake on top of it, how do I get 1
potato?
To get one potato from the basket of potatoes with a box of cake on top,
you can follow these steps:
1. Remove the box of cake from the basket.
2. Lift the basket and remove the
potatoes from the basket, one at a time.
3. If there are other items in the
basket, you can either remove them first or carefully lift the basket while
holding the potato, avoiding any other items that may be inside.
4. Once you have removed all the potatoes from the basket, you can put the basket back in
its original place or set it aside if you need to access it again.
Alternatively, you can gently lift the box of cake and the basket together,
carefully taking the potato out from underneath the cake without disturbing it.
However, this may be more challenging depending on the size and weight of the
box of cake and the basket.
@mrsipan Very nice output. Could you share your llama.cpp parameters?
@mdawid This is it:
./main --color --threads 7 --batch_size 256 --n_predict -1 --top_k 12 --top_p 1 --temp 0.36 \
--repeat_penalty 1.05 --ctx_size 2048 --instruct --reverse-prompt "### Human:" \
--model models/13b/ggml-vicuna-13b-4bit-rev1.bin \
-f prompts/vicuna.txt
where vicuna.txt contains:
A chat between a curious human and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the human's questions.
@ZYHowell would you have any update for us about a potential 30B version? :)
Hopefully we'll have a 30b-lora version by this week
希望这周我们能有一个 30b-lora 版本
cool
@mdawid This is it:
./main --color --threads 7 --batch_size 256 --n_predict -1 --top_k 12 --top_p 1 --temp 0.36 \ --repeat_penalty 1.05 --ctx_size 2048 --instruct --reverse-prompt "### Human:" \ --model models/13b/ggml-vicuna-13b-4bit-rev1.bin \ -f prompts/vicuna.txtwhere
vicuna.txtcontains:A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
Thanks for actually giving your full command line input. This is essential to getting similar results via testing.
Hopefully we'll have a 30b-lora version by this week
@ZYHowell any updates?
bump ;)
Hihi @ZYHowell do you have any updates so far?
Hi everyone, I understand the excitement for the 30B version. While we eagerly await the update, let's try to be patient and supportive of the maintainers. Remember, they're working hard and don't have any obligation to meet specific deadlines. If you're able, consider offering help to the community instead of posting follow-up messages. Thanks for your understanding! 😊
EDIT: You don't have to wait this, Checkout: WizardLM-30B, OR GPT4-X-Alpasta-30b
@musabgultekin thank you for replying. Wonder if there is a way for us to follow the fine-tunning process, perhaps in a dev channel. This is so interesting. Great work on the 13B version!
Could it be possible to benchmark Open Assistant's 30B LLaMA oasst finetune? (https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor). I have yet to run any quantitative stuff like perplexity, but it feels smarter. 4-bit quantized, it is able to run on 22GB of VRAM. In general, I think the newer Open Assistant models probably should be evaluated, they are on the 8th finetune of Pythia-12B now.
so is there any release plan of vicuna30b till now?
I'm also curious....
bump