Loreto Parisi
Loreto Parisi
> Yeah, I have always wondered why ADAM is considered state-of-the-art Adam or AdamW? The latter should be preferred...
> Here is the flash attention that I've tried without gaining any performance: [ggerganov/llama.cpp#778](https://github.com/ggerganov/llama.cpp/pull/778) > > As a side note, today I was intrigued by the "multi-query" attention paper that...
Oh Wow! Interestingly there is a more recent Multi-Query Attention implementation by MosaicLM team for the MPT 7B [here](https://huggingface.co/mosaicml/mpt-7b-chat/blob/main/attention.py#L174) I did not know they were using Multi-Query attention actually for...
@ggerganov is this the correct command ``` ./embedding -m models/7B/ggml-model-q4_0.bin -p "ciao" -n 512 ``` It seems it's not using the prompt in `p`. Infact I do not see in...
I have the same issue, I cannot convert [alpaca-lora](https://github.com/antimatter15/alpaca.cpp) models. I had to checkout previous commit then: ``` git checkout 5cb63e2493c49bc2c3b9b355696e8dc26cdd0380 ```
@eiz okay thanks, where I find the tokenizer file?
confirmed it worked for both llama and alpaca 7B. 🥇
> > Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want...
@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size: ``` main: seed = 1679320340 llama_model_load: loading model from...
Maybe of our interest https://github.com/TimDettmers/bitsandbytes