JetStream issues

Update JetStream instructions

- General cleanup of instructions - Generalizing the script to convert llama checkpoints s.t. it supports custom GCS buckets for the maxtext checkpoints. - Adding quantization instructions

yeandy

Clean up Model Conversion Script

2

Currently the model conversion script will [create a bucket](https://github.com/google/JetStream/blob/main/jetstream/tools/maxtext/model_ckpt_conversion.sh#L36) `export MODEL_BUCKET=gs://${USER}-maxtext`. However, it may be the case that the `gs://${USER}-maxtext` path already exists, which I imagine would break the script....

yeandy

when to support gpu?

1

great work! But when to support/release gpu?

Mddct

Performance optimized interleaved mode JetStream server

2

- Optimized TPU duty cycle (largest gap < 4ms) - Optimized TTFT: dispatch prefill tasks ASAP w/o unnecessary blocking in CPU, keep backpressure to enforce insert ASAP, return first token...

JoeZijunZhou

JetStream
JetStream copied to clipboard

Metadata

Update JetStream instructions

Clean up Model Conversion Script

when to support gpu?

Performance optimized interleaved mode JetStream server

Added `jetstream_total_tokens_in_current_batch` metric

Enable JetStream Standalone Server

Refactor Prometheus Metrics Logic and Added to Docs

Support using models from HuggingFace directly

باربری نیسان یزد 09133545880

Understanding the intuition behind `request-rate`

← Metadata

Owner

Metadata

JetStream JetStream copied to clipboard

Metadata

← Metadata

Owner

Metadata

JetStream
JetStream copied to clipboard