ShadiCopty comments

Results 7 comments of


                                            ShadiCopty

Running llama-stack with 8B llama on an AWS CPU only instance throwing an error

What is the recommended minimum system spec to successfully run llama stack's default model in your quick start? I've been trying different configurations without much success.

Running llama-stack with 8B llama on an AWS CPU only instance throwing an error

It was the quantization that's causing errors -- closing this.

llama download obscure signed url requirement

Thank you Itime-ren! Would be nice to have this instead of (e.g, https://....). Also, curious why I didn't need to go through this process with ollama (download was much more...

llama download obscure signed url requirement

Leaving open for the suggestions above

Quantization (FP8) causing errors

Absolutely: Model: Llama3.1-8B-Instruct [run.yaml.zip](https://github.com/user-attachments/files/17329799/run.yaml.zip) Removing the fp8 gets this stack to work. let me know if you need more info re the system.

Quantization (FP8) causing errors

/home/paperspace/anaconda3/envs/llamastack-mylocal/lib/python3.10/site-packages/torch/__init__.py:1145: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.) _C._set_default_tensor_type(t) E1023 03:48:27.577000 3035 anaconda3/envs/llamastack-mylocal/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:732] failed (exitcode: -9) local_rank: 0...

Quantization (FP8) causing errors

@ashwinb still failing, I removed all of the old installation to be sure, and am using the reference-meta-quantized implementation with fp8.