theaerotoad
theaerotoad
Right--it looks like both `main.cpp` and `server.cpp` implement self-extend _not_ through anything exposed in `llama.h`. I think the simplest implementation of it appears in [passkey.cpp](https://github.com/ggerganov/llama.cpp/blob/515f7d0d4fce41c752fc253acf30707c3be2531e/examples/passkey/passkey.cpp#L142C1-L151C42) Something like: ```cpp ... //...
I'm excited about this one, and was attempting to combine with [ Vulkan](https://github.com/leejet/stable-diffusion.cpp/issues/256) I'm seeing a compile time issue (around the pingpong function) in my merge, and seems it's in...
> @theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect). Tested it on gcc 12.2.0-14...
Yup, removing the pingpong endpoint allows compilation. Another thought--the default 'localhost' string didn't work on my end initially. Looks like `llama.cpp` server defaults to using 127.0.0.1 instead of 'localhost', so...
> Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use...
@stduhpf Yup, that fixes it. Thank you! Sure nice not to have to reload everything each time.
@stduhpf -- This is working pretty well, I played around with it a bit this weekend. I have a few tweaks, to enable other inputs to be specified (via html...
@Green-Sky -- would this enable flash attention for Vulkan builds as well?