alpaca-electron
alpaca-electron copied to clipboard
what ever I try no model loads
I downloaded the models from the link provided on version1.05 release page. But what ever I try it always sais couldn't load model. I use the ggml-model-q4_0.bin or the ggml-model-q4_0.bin files but nothing loads. I tried windows and Mac. It doesn't give me a proper error message just sais couldn't load model.
You need to download the q4_1 file, not q4_0.
I used the following link. https://huggingface.co/Pi3141/alpaca-7b-native-enhanced/blob/main/ggml-model-q4_1.bin it doesn't work, just sais can't load.
i tried so many models and they either fail to load or they never write anything at all, i used kobold and the models work fine so i dunno what im doing wrong, i like this tool a lot but it never actually worked for me
Where exactly did you get the models from?
from the link on the releases page https://huggingface.co/Pi3141
And you're using q4_1, right?
i tried this one https://huggingface.co/Pi3141/gpt4-x-alpaca-native-13B-ggml/blob/main/ggml-model-q4_1.bin
Can you try Alpaca native enhanced? https://huggingface.co/Pi3141/alpaca-7b-native-enhanced
Maybe you can show the terminal log if you are using mac or linux, that will be more clear.
that one works, i guess its just really slow? also it doesnt seem to take into account other stuff that runs on my pc because, its running at 100% and, my music now has these little skips in the audio, and my pc is unstable
i dont remember the kobold ui being so extreme, i could multitask with other stuff, also kobold shows me the tokens being read in real time which was really good feedback that it was doing stuff, but with alpaca electron i cant tell if the window is stuck or if its actually doing stuff, i really wish there was some text down here that said "Processing Characters: 1 of 5000"
or something like that, it would improve the usability by 200%
this is just, kinda annoying to look at and, it doesnt tell me anything, it just made me assume it was frozen
I'll consider adding the character processed counter. Most of this stuff is to do with llama.cpp though. I have no control over the CPU usage. Im just making the frontend for it.
i think you should add it or you are going to get more people reporting the models as broken
Actually I can't. Llama.cpp doesn't show how many tokens of the prompt has been processed.
What I'll do to fix people reporting that the model is broken is that I will make it a rule that people cannot open an issue if they haven't waited at least 1 hour for a response from the model to make sure that it's not just their computer.
Because if a model can't be load, the app will notify you. It only freezes in rare edge cases.
I tried all these models and none of them works, everything just sais couldn't load model. How do I find the terminall logs? I am using the macOS arm64 build.
bruh nobody is ever gonna wait one hour, they will just find another tool
Yeah good luck to them finding a different tool thats faster than llama.cpp. If it takes that long for llama.cpp to run for them, then their CPU spec is probably not good, thus it would also make sense that they wouldn't have a GPU or the GPU won't be powerful enough.
where can I find the terminal logs on Mac?
where can I find the terminal logs on Mac?
Sorry, I didn't test it on Mac before, I just assume when we run the command on terminal, it will display some info like this
//> llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state)
//> llama_init_from_file: kv self size = 1024.00 MB
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
Reverse prompt: '### Instruction:
Sorry, I didn't test it on Mac before, I just assume when we run the command on terminal, it will display some info like this
//> llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 3 (mostly Q4_1) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state) //> llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | main: interactive mode on. Reverse prompt: 'User:' Reverse prompt: '### Instruction:
That's normal, it's loading the model. Give it some time.
Hey I had the same problem on linux (fedora silverblue 38) and I tryed to compile it myself and then it worked! Im also guessing this is the same issues as: https://github.com/ItsPi3141/alpaca-electron/issues/24 https://github.com/ItsPi3141/alpaca-electron/issues/51
from the link on the releases page https://huggingface.co/Pi3141
And you're using q4_1, right?
What's the difference with q4_1.bin q4_2.bin q4_3.bin etc?