JetStream
JetStream copied to clipboard
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Results
21
JetStream issues
Sort by
recently updated
recently updated
newest added
Supporting openai-api-compatible endpoints like the `/v1/chat/completions` and `/v1/completions` APIs would have the following benefits: * Allow Jetstream to be used as a drop-in replacement for the vLLM server * Make...