klosax comments

Results 37 comments of


                                            klosax

System freeze when compiled with cublast

Great! :)

llama : add Falcon LLM support

How much of all the work done in this repo could easily be transferred to future models and architectures? It looks like the happy days of the original LLaMA models...

llama : add Falcon LLM support

> I was actually able to convert, quantize and load the model, but there is some tensor math to debug and modify but I have no 40GB gpu to debug...

llama : add Falcon LLM support

@nikisalli : On the [model card](https://huggingface.co/tiiuae/falcon-40b#model-architecture-and-objective) it says "head_dim 64 Reduced to optimise for FlashAttention" but in the config.json the number is 128. Maybe try reducing it to 64?

Implement MosiacML's 7B model.

Generation speed for StoryWriter model: at token 1000, about 300 ms per token at token 8000, about 2500 ms per token So if tokens generated is increased 8 times, the...

Train Text from scratch

We may need the new [Sophia Optimizer](https://arxiv.org/abs/2305.14342) for a 2X increase in training speed compared to Adam.

support for falcon model

The architecture that Falcon uses is different than those currently supported. More discussion here: https://github.com/ggerganov/llama.cpp/issues/1602

support for falcon model

Falcon LLM ggml framework with CPU and GPU support: https://github.com/cmp-nct/ggllm.cpp

Update replit inference code to match reference

> This actually changes a bit more than that PR; feel free to close though I will close #206 .

GPTNeoX model with 16k context results in context-size related issues: `ggml_new_tensor_impl: not enough space in the context's memory pool`, and instant core dump with fp16

> gpt_neox_model_load: ggml ctx size = 17592186043162.29 MB It seems to be a calculation error with signed and unsigned integers. Change `int` to `size_t` in [these](https://github.com/ggerganov/ggml/blob/758471b22630cc037244dbe1961a87097988aa75/examples/gpt-neox/main.cpp#LL159C1-L162C45) lines: ``` const int...