Mark Schmidt comments

Results 95 comments of


                                            Mark Schmidt

65B model giving incorect output

Does reducing top_p to something like 0.3 or even 0.1 provide better output for these larger models?

WebAssembly and emscripten headers

Without https://github.com/WebAssembly/memory64 implemented in web assembly you are going to run into show stopping memory issues with the current 4GB limit due to 32bit addressing. Do you have a plan...

WebAssembly and emscripten headers

I think that's a reasonable proposal @Dicklesworthstone. A purely 3-bit implementation of llama.cpp using GPTQ could retain acceptable performance and solve the same memory issues. There's an open issue for...

Raspberry Pi 4 4GB

![llama.cpp on Samsung S22 Ultra at 1.2 tokens per second](https://user-images.githubusercontent.com/5949853/224798872-d3a1e9d8-d0ce-4261-b1a8-247c2a154a9f.png) 1.2 tokens/s on a Samsung S22 Ultra running 4 threads. The S22 obviously has a more powerful processor. But I...

Raspberry Pi 4 4GB

Ah, yes. A 3-bit implementation of 7B would fit fully in 4GB of RAM and lead to much greater speeds. This is the same issue as in https://github.com/ggerganov/llama.cpp/issues/97. 3-bit support...

Raspberry Pi 4 4GB

@octoshrimpy I believe Mestrace is saying you should convert and quantize the model on a desktop computer with a lot of RAM first, then move the ~4GB 4bit quantized mode...

python bindings?

Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)

How to use ggml for Flan-T5

Since people in this thread are interested in Instruct models, I recommend checking out chatGLM-6B. I believe it is more capable than Flan-UL2 in just 6B parameters. I have a...

How to use ggml for Flan-T5

> @MarkSchmidty useful reference. Thanks From what I have observed GLM is mostly ignored due to it being weaker with English prompts. But it may turn out to be better...

How to use ggml for Flan-T5

That is the GPU memory required to run inference not the model size. ![](https://user-images.githubusercontent.com/5949853/226741304-4fe963d8-3c42-4404-b761-f6fb3316a0fe.png) The official int4 model is 4.06GB on HuggingFace before any pruning. \>It would help if there...