Fanhai Lu
Fanhai Lu
Command: ` python benchmarks/benchmark_serving.py --tokenizer /home//data/tokenizer.model --num-prompts 300 --dataset-path /home//data/ShareGPT_V3_unfiltered_cleaned_split.json --dataset sharegpt --save-request-outputs` Logs: > File "/home//JetStream/benchmarks/benchmark_serving.py", line 778, in > main(parsed_args) > File "/home//JetStream/benchmarks/benchmark_serving.py", line 574, in main >...
JetStream are jax or pytorch frame independent inference stack. It's binding to jax array in current code base. Please support np padding.
There multiple jax code in JetStream, we should shift jax related code to engine implementation and remove jax dependencies in JetStream. In the end, JetStream is orchestrator for Pytorch and...
We saw few service broken and performance degradation in last two weeks, it's hard to identify which PR cause the issues when the issue was found few days later. Right...
Right now, ray engine return interleave engine and a tuple separately. In the end, we would like to return a stable Tuple list for both of them.