Josh Leverette

Results 132 comments of Josh Leverette

@Confuze the `num_gpu` parameter that you set to 10000 was trying to force _more_ layers onto the GPU. Mixtral has 33 layers. You just have to keep lowering that number...

@madsamjp With a 4090, you should be able to offload all 33 layers of the 3-bit quantized models and get 50+ tokens per second. If you want to run the...

Part of the appeal of LWM is that it does support video, but I don’t think there’s any way to use it with videos in ollama currently.

I don’t have commit access to this repo, so I can’t reopen this issue, but it might be worth keeping it open for now.

I'm seeing the exact same extra-space issue some of the time in the few minutes that I've spent testing the extension in PyCharm, and the extra space before the suggestion...

Here is an example of the kind of JSON output I've seen before this PR: ```json { "bullet_points": [ { "text": "Some text here, clipped for brevity of the example"...

Given that we're in the `impl Period`, is there any motion to make forward progress on this? In addition to the heavily discussed stuff, this issue seems to be a...

This was probably the main issue for this kind of thing: https://github.com/ollama/ollama/issues/1952#issuecomment-2105376333 I would probably leave a comment there too. Since you're on AMD, it's not actually related to CUDA,...

https://github.com/ollama/ollama/issues/4245 https://github.com/ollama/ollama/issues/4221

https://github.com/ggerganov/llama.cpp/pull/7225 just merged, so I wonder if it's time to get the 128k models added to the library as well.