llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Fix issue #183

Open Cola-Rex opened this issue 4 months ago • 0 comments

Fix issue #183 : Pre-download models during server initialization to prevent HTTP timeouts

This commit moves the model downloading logic from the chat_completion method to the initialize method in OllamaInferenceAdapter. By pre-loading required models during server startup, we ensure that large models (e.g., 16GB) are downloaded before serving requests, thus preventing HTTP request timeouts and aborted downloads during the first inference request.

Closes #183

Cola-Rex avatar Oct 08 '24 11:10 Cola-Rex