Running llama-stack with 8B llama on an AWS CPU only instance throwing an error
I'm trying to run llama stack on a cpu-only node to try it out, getting the error below, can't find a way to tell it I don't have one:
File "/home/ubuntu/anaconda3/envs/llamastack-my-local-stack/lib/python3.10/site-package s/torch/distributed/distributed_c10d.py", line 1594, in _new_process_group_helper backend_class = ProcessGroupNCCL( ValueError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Typically running models without GPUs will be very slow. Partners like ollama are able to offer a way to run with cpus. You can try llama stack with ollama to run on a cpu-only node. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distribution/templates/local-ollama-build.yaml
What is the recommended minimum system spec to successfully run llama stack's default model in your quick start? I've been trying different configurations without much success.
It was the quantization that's causing errors -- closing this.