earl-of-embedding

Results 5 comments of earl-of-embedding

Theory: Could it be too short context in ollama from codex cli? Why do I think that? If I run gpt-oss:20b, long documents also de-rail codex (ubuntu linux machine). "ollama...

I am still scratching my head. https://github.com/openai/codex/blob/main/codex-rs/core/src/openai_model_info.rs clearly states that 128k should be used for context: ```rust pub(crate) fn get_model_info(model_family: &ModelFamily) -> Option { let slug = model_family.slug.as_str(); match slug...

You are right. In the meantime, I tried to correct this behavior in my local fork of codex-rs. With a lot of tinkering I got "ollama ps" sometimes to show...

Curious. When you do "ollama ps" during codex operation, is the model always with 32k or does it sometimes 'switch back' to 8K?

Just for other readers of this thread: I had success with running llamacpp with full 128k of both the 20B and the 120B model of GPT-OSS-x20B (quad 3090 system here)....