llama-stack Fix issue #183

Fix issue #183

Open Cola-Rex opened this issue 4 months ago • 0 comments

Fix issue #183: Pre-download models during server initialization to prevent HTTP timeouts

This commit moves the model downloading logic from the chat_completion method to the initialize method in OllamaInferenceAdapter. By pre-loading required models during server startup, we ensure that large models (e.g., 16GB) are downloaded before serving requests, thus preventing HTTP request timeouts and aborted downloads during the first inference request.

Closes #183

Oct 07 '24 17:10 Cola-Rex

llama-stack llama-stack copied to clipboard

Fix issue #183

llama-stack
llama-stack copied to clipboard