ollama
ollama copied to clipboard
8 GPUs want to start 8 same models
I tried so many methods. environment: win10 22H, latest, nightly
methods: 1.multi ollama serve: failed on gpu split. 2.multi ollama docker: --gpus all & CUDA_VISIBLE_DEVICES=0 / --gpus all & CUDA_VISIBLE_DEVICES=1 and so on.. use local different copied model path into docker (to save download time), load model extremely slow, 32B used 15min. and often killed by gpu vram..
need: multi same model server locally.
Thanks!!!