phalexo comments

Results 137 comments of


                                            phalexo

[Enhancement]: Chunk & save task gen process? Auto execute option? #invalid JSON #local LLM

Not sure what you mean by storing token. If you navigate to Hugging Face and try to access Llama models it will ask you to go through a quick process....

[Bug]: Pythogora constantly tries to create new files (not always) with "changes" instead of correcting files it created before.

> hey there, does this happend during troubleshoot or debugging ? Not sure what the difference is between "troubleshoot" versus debugging. It certainly happens often when it "asks" me if...

[Bug]: Pythogora constantly tries to create new files (not always) with "changes" instead of correcting files it created before.

You see how here it realizes that it created a bunch of unnecessary files, with duplicated code? ```bash Dev step 206 After reviewing the task implementation, I found some issues...

Delays and slowness when using mixtral

Ollama has a history file in the ~/.ollama folder. Does ollama constantly parse that cache?

Delays and slowness when using mixtral

> The default `mixtral` Modelfile only offloads like 22 layers, as noted previously. For people with 24GB of VRAM, I have found that the `q3_K_S` model can be completely offloaded...

Delays and slowness when using mixtral

I was wondering if there is any indication that someone is looking into this? Also, I am wondering what effect LLAMA_CUDA_FORCE_MMQ=on setting has on the performance. If the optimized cuBLAS...

Delays and slowness when using mixtral

I don't think we should confuse two separate problems. Sometimes there is really not enough VRAM. Sometimes you run into cuBLAS 15 error which was introduced starting with v0.1.12. Which...

Error: llama runner exited, you may not have enough available memory to run this model

I am running on the Maxwell architecture. ```bash git clone --recursive https://github.com/jmorganca/ollama.git cd ollama/llm/llama.cpp vi generate_linux.go ``` ```go //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake...

Error: llama runner exited, you may not have enough available memory to run this model

Do you have tensor cores on your GPU? I doubt it.

Error: llama runner exited, you may not have enough available memory to run this model

You have to rebuild with LLAMA_CUDA_FORCE_MMQ=on performance may be a bit worse, but it would work. On Tue, Dec 19, 2023, 9:09 AM Richard Sun ***@***.***> wrote: > The same...