maxtext
maxtext copied to clipboard
Add enable_model_warmup flag for AOT compilation at model server start
Add the enable_model_warmup
flag at model server start
Associated PR: https://github.com/google/JetStream/pull/92
- model_name=gemma-7b
- tokenizer_path=assets/tokenizer.gemma
- per_device_batch_size=1
- max_prefill_predict_length=1024
- max_target_length=2048
- async_checkpointing=false
- ici_fsdp_parallelism=1
- ici_autoregressive_parallelism=-1
- ici_tensor_parallelism=1
- scan_layers=false
- weight_dtype=bfloat16
- load_parameters_path=<ckpt_path>
- enable_model_warmup=true
curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{
"prompt": "What are the top 5 programming languages",
"max_tokens": 200
}'
{
"response": " for data science in 2023?\n\n1. Python\n2. R\n3. SQL\n4. Java\n5. Scala\n\n**Note:** The order is based on popularity and demand in the data science industry in 2023."
}