klosax comments

Results 37 comments of


                                            klosax

ggml : unified file format

This is the first step to realize a unified llm API and interface and that would handle any supported architecture. https://github.com/ggerganov/llama.cpp/issues/1602#issuecomment-1568215353 https://github.com/ggerganov/ggml/issues/185 https://github.com/ggerganov/ggml/pull/145#issuecomment-1544733902

ggml : unified file format

> > `general.architecture: String`: describes what architecture this model implements. Values can include llama, mpt, gpt-neox, gpt-j, gpt-2, bloom, etc. > > It might make more sense to make something...

ggml : unified file format

> That being said, that reminds me - it might be a good idea to include suggested prompt formats as one of the standardised config parameters. Feel free to +1...

ggml : unified file format

> vocabulary.huggingface_tokenizer_json: String: the entirety of the HF tokenizer.json for a given model .. Optional, but highly recommended for best tokenization quality with supported executors. Why would json give a...

ggml : unified file format

> I wasn't aware of the existence of other ways to store the tokenization data, and I'd have to look into it. Do you have any further information about it...

System freeze when compiled with cublast

Using a 7B model the freeze is about 5 seconds, 30B model 20 seconds. I tried using --no-map with the 30B model and the system froze for 5 minutes(!) right...

System freeze when compiled with cublast

The prompt eval time is 2.5 times slower also: Release 305eb5a output: ``` ./main -m ../llama-33b-supercot-ggml-q5_1.bin -c 2048 -p "Hiking is" -n 16 -t 6 main: seed = 1682775656 llama.cpp:...

System freeze when compiled with cublast

Thanks. So it seems to be related to Ubuntu and / or AMD cpus. I'm running Ubuntu 20.04 and have an AMD Ryzen 5 cpu.

System freeze when compiled with cublast

I found out what the problem is. The model did not fit into RAM. When using the b1ee8f5 release it works even if the model dont fit in RAM, but...

System freeze when compiled with cublast

Maybe implement a parameter to not use pinned memory, as the previous version did work fine on swapped memory.