JetStream icon indicating copy to clipboard operation
JetStream copied to clipboard

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Unit Tests PyPI version PyPi downloads Contributions welcome

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

  • Git: https://github.com/google/maxtext
  • README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md

Pytorch

  • Git: https://github.com/google/jetstream-pytorch
  • README: https://github.com/google/jetstream-pytorch/blob/main/README.md

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

pip install -r requirements.txt

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.tools.requester

# Load test local mock server
python -m jetstream.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m jetstream.tests.core.test_orchestrator

# Test JetStream core server library
python -m jetstream.tests.core.test_server

# Test mock JetStream engine implementation
python -m jetstream.tests.engine.test_mock_engine

# Test mock JetStream token utils
python -m jetstream.tests.engine.test_utils