Chi Kim
Chi Kim
Is there disadvantage? If not, should be enabled by default for everyone other than for the systems that can't support?
Does Ollama support any embedding model yet? If so, which and where can I get?
I know llama.cpp is designed for cpu in mind, but is there Python library to run quantized GGML models on Colab with GPU for faster result?
@jmorganca could you exposed these api points to Ollama? Llama.cpp server has POST /tokenize and POST /detokenize. https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md Then we can just count the number of tokens after tokenize?
Make sure you specify a seed, so random generator use the same seed every time.
Awesome! Does it work for api as well?
I already did even before Ollama had. I'm requesting for other people.
Try 0.1.32. It works nicely for me. https://github.com/ollama/ollama/releases
I'm using a package that uses tqdm to print progress on console. Is there a way to capture output without modifying the package I'm using, so I can display on...
It seems like setting no_stream fixed for clicking regenerate. However, if you have generation attempts to 5, it sounds like whenever it generates new response, it interrupts the audio and...