JetStream icon indicating copy to clipboard operation
JetStream copied to clipboard

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Results 21 JetStream issues
Sort by recently updated
recently updated
newest added

- Update gprc proto to support - Request: token id or text (one of). - Response: token id, text or both of them. - Currently, request with either token id...

- Customer request: We use multiple languages for clients and cannot implement detokenization in each one. Need to have server-side detokenization support.

## Issue Currently we assume few things in jetstream which hinders it's generalization: 1. tokenizer is SentencePiece based. 2. pad_id is 0 3. after encode, we pad to nearest power...

Command: ` python benchmarks/benchmark_serving.py --tokenizer /home//data/tokenizer.model --num-prompts 300 --dataset-path /home//data/ShareGPT_V3_unfiltered_cleaned_split.json --dataset sharegpt --save-request-outputs` Logs: > File "/home//JetStream/benchmarks/benchmark_serving.py", line 778, in > main(parsed_args) > File "/home//JetStream/benchmarks/benchmark_serving.py", line 574, in main >...

JetStream are jax or pytorch frame independent inference stack. It's binding to jax array in current code base. Please support np padding.

There multiple jax code in JetStream, we should shift jax related code to engine implementation and remove jax dependencies in JetStream. In the end, JetStream is orchestrator for Pytorch and...