Add enable_model_warmup flag for AOT compilation at model server start
Add the enable_model_warmup flag at model server start
Associated PR: https://github.com/google/JetStream/pull/92
- model_name=gemma-7b
- tokenizer_path=assets/tokenizer.gemma
- per_device_batch_size=1
- max_prefill_predict_length=1024
- max_target_length=2048
- async_checkpointing=false
- ici_fsdp_parallelism=1
- ici_autoregressive_parallelism=-1
- ici_tensor_parallelism=1
- scan_layers=false
- weight_dtype=bfloat16
- load_parameters_path=<ckpt_path>
- enable_model_warmup=true
curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{
"prompt": "What are the top 5 programming languages",
"max_tokens": 200
}'
{
"response": " for data science in 2023?\n\n1. Python\n2. R\n3. SQL\n4. Java\n5. Scala\n\n**Note:** The order is based on popularity and demand in the data science industry in 2023."
}
Something happened when trying to squash the commits, so I created another PR. The old one is here: https://github.com/google/maxtext/pull/763. Per discussion in that PR, we should keep the else False to prevent the model warmup logic from running regardless. @gobbleturk
Something happened when trying to squash the commits, so I created another PR. The old one is here: #763. Per discussion in that PR, we should keep the
else Falseto prevent the model warmup logic from running regardless. @gobbleturk
Sure thats fine. I haven't seen anyone use blank/None for the configs yet, but I suppose this gives us some default behavior for that case...