jan
jan copied to clipboard
bug: Request timeout/cancellation with slow inference models (1-4 t/s) [Critical]
Version: Jan.ai v0.6.9
Describe the Bug Jan.ai consistently terminates long-form AI generation with "Request Cancelled" error after approximately 120-200 seconds of inference, regardless of model performance or system resources. This timeout appears to be a hard-coded frontend limitation that prevents completion of extended reasoning tasks, mathematical problems, or creative writing that require sustained generation. The same models run perfectly in KoboldCpp (which uses identical llama.cpp backend) without any timeout issues, suggesting this is a Jan.ai architectural problem rather than a model or hardware limitation.
Steps to Reproduce
Load any large language model (tested with GLM-4.5-AIR IQ4_XS, ~60GB) Configure model with standard settings (tested multiple batch size configurations) Submit a complex prompt requiring 500+ tokens of response (philosophical reasoning, long-form creative writing, mathematical problems) Observe that model begins generating at normal speed (1-4 tokens/second) After ~120-200 seconds of generation, Jan.ai displays "Request Cancelled" error and terminates response Attempt same prompt in KoboldCpp with identical model - completes successfully without timeout
Screenshots / Logs Attempted Configuration Fixes (All Failed):
Reduced batch_size from 2048 to 256 Reduced ubatch_size from 512 to 128 Set environmental variables: LLAMA_SERVER_TIMEOUT=3600, LLAMA_REQUEST_TIMEOUT=3600 Modified model YAML and JSON configs with timeout parameters Disabled dynamic batching Reduced context size to 2048
Error Behavior:
Timeout occurs regardless of available system resources (70GB+ free RAM) Model continues generating normally until sudden cancellation No memory pressure, swap usage, or CPU throttling observed Issue persists across different model sizes and quantization levels
Validation via Alternative Client:
Same model in KoboldCpp: Generates 2000+ tokens without issues Same hardware, same llama.cpp backend, same model files Only difference is Jan.ai vs KoboldCpp frontend
Operating System
Linux (Manjaro - primary testing) Linux (LMDE 6 on AMD 4800H laptop - confirmed replication)
Hardware Tested:
Primary: AMD Threadripper 1950X, 128GB DDR4-2133, AMD RX480 (GPU disabled) Secondary: AMD 4800H laptop, 16GB RAM (both iGPU and CPU-only modes affected)
Performance Context:
Models tested: GLM-4.5-AIR IQ4_XS, Anubis-Pro-105B, BigTiger-27B Inference speeds: 1-4 tokens/second (hardware bandwidth limited) All models exhibit identical timeout behavior in Jan.ai All models work perfectly in KoboldCpp/raw llama.cpp
Impact: This bug makes Jan.ai unsuitable for any use case requiring extended generation, including academic research, creative writing, complex reasoning tasks, or technical analysis. Users are forced to switch to alternative frontends to access full model capabilities.
Operating System
- [ ] MacOS
- [ ] Windows
- [ X] Linux