S. Dale Morrey

Results 36 comments of S. Dale Morrey
trafficstars

> So what you're telling me is that it's got a 32-bit ARM CPU on it with 2 cores. I doubt there's much advantage offloading to that. Plus having to...

I'm confused ``` ollama run --verbose deepseek-coder-v2:16b-lite-instruct-q8_0 >>> /show info Model arch deepseek2 parameters 15.7B quantization Q8_0 context length 163840 embedding length 2048 ``` It looks like the training context...

Ok so I figured this out on my own with a little help from deepseek-coder-v2:16b-lite-instruct-q8_0. The context length reported is the maximum length the model can support even in theory....

I do this with a parameter over the openai api, just follow the openai docs for the REST API. It works.

> [@rick-github](https://github.com/rick-github) thanks it worked > > Also, one more thing, do you know how I set the `dimensions` `Qwen3-Embedding-8B-GGUF` generates 4096 tokens currently i want only `1024` It's MRL...

> > It's MRL like nomic-embed-text this means you just truncate to the size you want since the most important vectors are first. I'm getting better results on Qwen3-0.6b embed...