llama-stack-apps Step by step instructions to install and run the Llama Stack on Linux and Mac

I managed to make the Llama Stack server and client work with Ollama on both EC2 (with 24GB GPU) and Mac (tested on 2021 M1 and 2019 2.4GHz i9 MBP, both with 32GB memory). Steps are below:

Open one Terminal, go to your work directory, then:

git clone  https://github.com/meta-llama/llama-agentic-system
cd llama-agentic-system
conda create -n llama-stack python=3.10
conda activate llama-stack
pip install -r requirements.txt

If you're on Linux, run:

curl -fsSL https://ollama.com/install.sh | sh

Otherwise, download the Ollama zip for Mac here, unzip it and double click the Ollama.app to move it to the Applications folder.

On the same Terminal, run:

ollama pull llama3.1:8b-instruct-fp16

to download the Llama 3.1 8B model and then run:

ollama run llama3.1:8b-instruct-fp16

to confirm it works by entering some question and expecting Llama's answer.

Now run the command below to install Llama Stack's Ollama distribution:

llama distribution install --spec local-ollama --name ollama

You should see (and hit enter to accept default settings for Configuring..., except n & n for the two questions related to llama_guard_shield & prompt_guard_shield):

Successfully setup distribution environment. Configuring... Configuring API surface: inference Enter value for url (default: http://localhost:11434):

Configuring API surface: safety Do you want to configure llama_guard_shield? (y/n): n Do you want to configure prompt_guard_shield? (y/n): n

Configuring API surface: agentic_system

YAML configuration has been written to /Users/<your_name>/.llama/distributions/ollama/config.yaml Distribution ollama (with spec local-ollama) has been installed successfully!

Launch the ollama distribution by running:

llama distribution start --name ollama --port 5000

Finally on another Terminal, go to the llama-agentic-system folder, then:

conda activate ollama

and either (on Mac)

python examples/scripts/vacation.py localhost 5000 --disable_safety

or (on Linux)

python examples/scripts/vacation.py [::] 5000 --disable_safety

You should see output starting with (Note: If you start the script right after Step 5, especially on a slower machine such as 2019 Mac with 2.4GHz i9, you may see "httpcore.ReadTimeout" because the Llama model is still being loaded; wait a moment and retry (a few times) should work):

User> I am planning a trip to Switzerland, what are the top 3 places to visit? StepType.inference> Switzerland is a beautiful country with a rich history, stunning landscapes, and vibrant culture. Here are three top places to visit in Switzerland:

Jungfraujoch: Also known as the "Top of Europe," Jungfraujoch is the highest train station in Europe, located at an altitude of 3,454 meters (11,332 feet) above sea level. It offers breathtaking views of the surrounding mountains and glaciers, including the iconic Eiger, Mönch, and Jungfrau peaks.

and on the first Terminal that runs llama distribution start --name ollama --port 5000, you should see:

INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) Environment: ipython Tools: brave_search, wolfram_alpha, photogen

Cutting Knowledge Date: December 2023 Today Date: 09 August 2024

INFO: ::1:50987 - "POST /agentic_system/create HTTP/1.1" 200 OK INFO: ::1:50988 - "POST /agentic_system/session/create HTTP/1.1" 200 OK INFO: ::1:50989 - "POST /agentic_system/turn/create HTTP/1.1" 200 OK role='user' content='I am planning a trip to Switzerland, what are the top 3 places to visit?' Pulling model: llama3.1:8b-instruct-fp16 Assistant: Switzerland is a beautiful country with a rich history, stunning landscapes, and vibrant culture. Here are three top places to visit in Switzerland:

Jungfraujoch: Also known as the "Top of Europe," Jungfraujoch is a mountain peak located in the Bernese Alps. It's the highest train station in Europe, offering breathtaking views of the surrounding mountains, glaciers, and valleys. You can take a ride on the Jungfrau Railway, which takes you to the summit, where you can enjoy stunning vistas, visit the Ice Palace, and even ski or snowboard in the winter.

Bonus: To see the tool calling (see here and here for more info) in action, try the hello.py example, which asks Llama "Which players played in the winning team of the NBA western conference semifinals of 2024, please use tools" whose answer needs a web search tool, followed by a prompt "Hello". On Mac, run (replace localhost with [::] on Linux):

python examples/scripts/hello.py localhost 5000 --disable_safety

And you should see the output returning "BuiltinTool.brave_search" below (if you see "httpcore.ReadTimeout", retry should work):

User> Hello StepType.inference> Hello! How can I assist you today? User> Which players played in the winning team of the NBA western conference semifinals of 2024, please use tools StepType.inference> brave_search.call(query="NBA Western Conference Semifinals 2024 winning team players") StepType.tool_execution> Tool:BuiltinTool.brave_search Args:{'query': 'NBA Western Conference Semifinals 2024 winning team players'} StepType.tool_execution> Tool:BuiltinTool.brave_search Response:{"query": null, "top_k": []} StepType.shield_call> No Violation StepType.inference> I need to search for information about the 2024 NBA Western Conference Semifinals.

If you delete "please use tools" in the prompt of hello.py, not wanting to beg, you'll likely see the output:

I'm not able to provide real-time information. However, I can suggest some possible sources where you may be able to find the information you are looking for.

By setting an appropriate system prompt, or switching to a bigger sized Llama 3.1 model - details coming soon - you'd see you don't have to be too polite to make Llama comfortable but yourself not.

Aug 09 '24 18:08 jeffxtang

i have ubuntu - step 4 gives the below error; any help is greatly appreciated

Aug 12 '24 03:08 amkoupaei

Your error message says "Conda environment 'ollama' exists". Did you run Step 4 more than once? What does "conda env list|grep ollama" show? Can you try "llama distribution install --spec local-ollama --name ollama2" assuming "ollama2" doesn't exist then use "ollama2" instead of "ollama" in Steps 5 and 6.

Aug 12 '24 04:08 jeffxtang

conda env list|grep ollama gives

llama distribution install --spec local-ollama --name ollama2 gives the same error as the original screenshot

Aug 12 '24 04:08 amkoupaei

I see PS1: unbound variable (install_distribution.sh sets -e), so I suspect that there's an issue with the prompt when the script attempts to activate the environment. @amkoupaei, are you able to create/use other conda environments successfully? Also, any reason you need to run as root?

Aug 12 '24 14:08 dltn

Noted - thank you.

I can create other conda envs successfully. Also no need for root; I just tried that route for debugging this issue. Running as non-root has the same issue

Aug 12 '24 16:08 amkoupaei

@amkoupaei Dont have hands on an unbuntu machine to try this right now but some early debugging seems like if we update line 111 in install_distribution.sh to

python_interp=$(conda run --no-capture-output -n "$env_name" which python)

This might fix the issue for you. Can you give this a try and see if this fixes it for you ?

Aug 12 '24 18:08 hardikjshah

Unfortunately, it did not work either. I also tried this on a fresh Ubuntu EC2 instance; still the same issue

Aug 12 '24 19:08 amkoupaei

I just tried on a fresh EC2 too and it worked for me - the complete log of "llama distribution" is here. What's your log or diff look like? @amkoupaei

Aug 12 '24 22:08 jeffxtang

here is the logs:

logs.log

Aug 12 '24 23:08 amkoupaei

Really odd. Can you run conda run -n agentic_env which python in your shell and paste what it outputs? Does it succeed?

Aug 15 '24 01:08 ashwinb

I simplified a bit: https://github.com/meta-llama/llama-stack/commit/0d933ac4c5a482d47631d446dd7c8fc48f1e0272

Can you see if this helps?

Aug 15 '24 03:08 ashwinb

yes, that succeeds - giving the location of the python installation. I might consider an alternative path and use the already deployed models on cloud.

Thank you all for your help/support.

Aug 15 '24 16:08 amkoupaei

@hardikjshah @dltn we need to host these instructions (these are great!) somewhere in our READMEs or instructions for Ollama. What would be the right place?

Aug 15 '24 16:08 ashwinb

Running the command : llama distribution install --spec local-ollama --name ollama Getting this output: usage: llama [-h] {download,model,stack} ... llama: error: argument {download,model,stack}: invalid choice: 'distribution' (choose from 'download', 'model', 'stack') I'm new here trying to run llama using Mac. distribution doesn't seem to be an argument in llama. Help would be appreciated.

Oct 19 '24 12:10 HabebNawatha

hi @HabebNawatha, please try the quick start guide here to run llama stack with mac.

Jan 13 '25 18:01 heyjustinai

llama-stack-apps llama-stack-apps copied to clipboard

Step by step instructions to install and run the Llama Stack on Linux and Mac

llama-stack-apps
llama-stack-apps copied to clipboard