aminekhelif
aminekhelif

@saiqulhaq Yes, I set the timeout to 120 and lowered all the temperatures. It seems to work better in terms of generating meaningless tokens, but I’m still encountering the cognitive...
@d8rt8v Yes, I also tried using `nvidia_llama-3.1-nemotron-70b-instruct-hf`. It performs better in the test campaign, but the `cognitive_state` problem still appears. with this config ``` MODEL=nvidia_llama-3.1-nemotron-70b-instruct-hf MAX_TOKENS=3000 TEMPERATURE=0.0 FREQ_PENALTY=0.0 PRESENCE_PENALTY=0.0 TIMEOUT=480...
After 7 hours of testing, I successfully resolved the cognitive state problem by implementing structured output schemas for each prompt. This ensures that the LLM’s output is guided, resulting in...
@OriginalGoku I didn’t modify the README file, but in essence, I adapted the LLM calls to work with local LLMs (like LMStudio in my case). I also added all the...
@4F2E4A2E Have you tried my fork ? ps : i use lm studio
@4F2E4A2E you tried this branch ? i use lm studio not Ollama 
@P3GLEG either ollama, azure or openai clients require the `set_api_cache` method, which is not implemented in the Ollama client