llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Running llama-stack with 8B llama on an AWS CPU only instance throwing an error

Open ShadiCopty opened this issue 1 year ago • 2 comments

I'm trying to run llama stack on a cpu-only node to try it out, getting the error below, can't find a way to tell it I don't have one:

File "/home/ubuntu/anaconda3/envs/llamastack-my-local-stack/lib/python3.10/site-package s/torch/distributed/distributed_c10d.py", line 1594, in _new_process_group_helper backend_class = ProcessGroupNCCL( ValueError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

ShadiCopty avatar Oct 07 '24 03:10 ShadiCopty

Typically running models without GPUs will be very slow. Partners like ollama are able to offer a way to run with cpus. You can try llama stack with ollama to run on a cpu-only node. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distribution/templates/local-ollama-build.yaml

raghotham avatar Oct 07 '24 18:10 raghotham

What is the recommended minimum system spec to successfully run llama stack's default model in your quick start? I've been trying different configurations without much success.

ShadiCopty avatar Oct 08 '24 02:10 ShadiCopty

It was the quantization that's causing errors -- closing this.

ShadiCopty avatar Oct 10 '24 14:10 ShadiCopty