llama-stack
llama-stack copied to clipboard
Fix issue #183
Fix issue #183 : Pre-download models during server initialization to prevent HTTP timeouts
This commit moves the model downloading logic from the chat_completion method to the initialize method in OllamaInferenceAdapter. By pre-loading required models during server startup, we ensure that large models (e.g., 16GB) are downloaded before serving requests, thus preventing HTTP request timeouts and aborted downloads during the first inference request.
Closes #183