Asankhaya Sharma
Asankhaya Sharma
Support the /completions endpoint for inbuilt inference server. _Originally posted by @SeriousJ55 in https://github.com/codelion/optillm/discussions/168#discussioncomment-12403092_
Based on the idea here - https://www.reddit.com/r/ollama/comments/1grlewl/comment/lxdd6hr/?context=3
## 🎯 Overview Implement a novel inference optimization approach inspired by [this research idea](https://x.com/willccbb/status/1940557166248972387): **a lightweight retriever that processes streaming Chain-of-Thought reasoning to inject contextual hints from a memory bank,...
# Problem The emergence of the **inference-time compute paradigm** presents a critical safety challenge: ensuring **chain-of-thought (CoT) faithfulness**. Through our work on **OptiLLM**, an open-source inference optimization framework implementing over...
# Proposal: Implement CoRT (Code-integrated Reasoning within Thinking) Approach ## Summary Implement a new approach inspired by the [CoRT paper](https://arxiv.org/abs/2506.09820) that combines our existing `thinkdeeper` and `z3` capabilities to create...
Hey there, I've been investigating a memory issue in the SimpleHeap implementation and found that `removed_node_tuples` can grow indefinitely during pathfinding, especially on larger grids. The problem is that when...