Loreto Parisi

Results 293 comments of Loreto Parisi

> Yeah, I have always wondered why ADAM is considered state-of-the-art Adam or AdamW? The latter should be preferred...

> Here is the flash attention that I've tried without gaining any performance: [ggerganov/llama.cpp#778](https://github.com/ggerganov/llama.cpp/pull/778) > > As a side note, today I was intrigued by the "multi-query" attention paper that...

Oh Wow! Interestingly there is a more recent Multi-Query Attention implementation by MosaicLM team for the MPT 7B [here](https://huggingface.co/mosaicml/mpt-7b-chat/blob/main/attention.py#L174) I did not know they were using Multi-Query attention actually for...

@ggerganov is this the correct command ``` ./embedding -m models/7B/ggml-model-q4_0.bin -p "ciao" -n 512 ``` It seems it's not using the prompt in `p`. Infact I do not see in...

I have the same issue, I cannot convert [alpaca-lora](https://github.com/antimatter15/alpaca.cpp) models. I had to checkout previous commit then: ``` git checkout 5cb63e2493c49bc2c3b9b355696e8dc26cdd0380 ```

@eiz okay thanks, where I find the tokenizer file?

confirmed it worked for both llama and alpaca 7B. 🥇

> > Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want...

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size: ``` main: seed = 1679320340 llama_model_load: loading model from...

Maybe of our interest https://github.com/TimDettmers/bitsandbytes