llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Add Runpod Provider

Open pandyamarut opened this issue 1 year ago • 6 comments

Why this PR We want to add Runpod as remote inference provider for Llama-stack. Runpod endpoints are OpenAI Compatible, hence it's recommended to use it with Runpod model serving endpoints.

What does PR Includes

  1. Integration with the Distribution.
  2. OpenAI as a Client.

How did we test? After setting the configuration by providing the : endpoint_url and api_key and keeping other settings as a default, launched a server using:

llama stack run remote_runpod --port 8080.

  1. Invoke the call(streaming): curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":true}'

Response:

data: {"event":{"event_type":"start","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"Here","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"'s","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"complete","delta":"","logprobs":null,"stop_reason":"end_of_turn"}}
  1. Invoke the call(non-streaming) curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":false}'

Response:

data: {"completion_message":{"role":"assistant","content":"Here's a 2-sentence poem about the moon:\n\nThe moon glows softly in the midnight sky, \nA beacon of peace, as it drifts gently by.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}

pandyamarut avatar Sep 30 '24 11:09 pandyamarut