JetStream
JetStream copied to clipboard

Published 20 hours ago •

→

Metadata

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Reame
Issues

Results 21 JetStream issues

Sort by recently updated

Support completions API

Supporting openai-api-compatible endpoints like the `/v1/chat/completions` and `/v1/completions` APIs would have the following benefits: * Allow Jetstream to be used as a drop-in replacement for the vLLM server * Make...

nstogner

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

pytorch

gpu

jax

transformer

inference

mlops

gpt

tpu

model-serving

llm

large-language-models

gemma

llama

llmops

llama2

llm-inference

196

Stars

Forks

Watchers

Owner

google

← Metadata

196

Stars

Forks

Watchers

Owner

google

Metadata

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Back

JetStream JetStream copied to clipboard

Metadata

Support completions API

← Metadata

Owner

Metadata

JetStream
JetStream copied to clipboard