llama-stack
llama-stack copied to clipboard
Fix issue #183
Fix issue #183: Pre-download models during server initialization to prevent HTTP timeouts
This commit moves the model downloading logic from the chat_completion
method to the initialize
method in OllamaInferenceAdapter
. By pre-loading required models during server startup, we ensure that large models (e.g., 16GB) are downloaded before serving requests, thus preventing HTTP request timeouts and aborted downloads during the first inference request.
Closes #183