JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

Git: https://github.com/google/maxtext
README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md

Pytorch

Git: https://github.com/google/jetstream-pytorch
README: https://github.com/google/jetstream-pytorch/blob/main/README.md

Documentation

Online Inference with MaxText on v5e Cloud TPU VM [README]
Online Inference with Pytorch on v5e Cloud TPU VM [README]
Serve Gemma using TPUs on GKE with JetStream
JetStream Standalone Local Setup

JetStream Standalone Local Setup

Getting Started

Setup

pip install -r requirements.txt

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.tools.requester

# Load test local mock server
python -m jetstream.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m jetstream.tests.core.test_orchestrator

# Test JetStream core server library
python -m jetstream.tests.core.test_server

# Test mock JetStream engine implementation
python -m jetstream.tests.engine.test_mock_engine

# Test mock JetStream token utils
python -m jetstream.tests.engine.test_utils

JetStream
JetStream copied to clipboard

Metadata

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream Engine Implementation

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

Run local server & Testing

Test core modules

← Metadata

Owner

Metadata

JetStream JetStream copied to clipboard

Metadata

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream Engine Implementation

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

Run local server & Testing

Test core modules

← Metadata

Owner

Metadata

JetStream
JetStream copied to clipboard