maxtext Add enable_model_warmup flag for AOT compilation at model server start

Add the enable_model_warmup flag at model server start Associated PR: https://github.com/google/JetStream/pull/92

 - model_name=gemma-7b
  - tokenizer_path=assets/tokenizer.gemma
  - per_device_batch_size=1
  - max_prefill_predict_length=1024
  - max_target_length=2048
  - async_checkpointing=false
  - ici_fsdp_parallelism=1
  - ici_autoregressive_parallelism=-1
  - ici_tensor_parallelism=1
  - scan_layers=false
  - weight_dtype=bfloat16
  - load_parameters_path=<ckpt_path>
  - enable_model_warmup=true

curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{
    "prompt": "What are the top 5 programming languages",
    "max_tokens": 200
}'
{
    "response": " for data science in 2023?\n\n1. Python\n2. R\n3. SQL\n4. Java\n5. Scala\n\n**Note:** The order is based on popularity and demand in the data science industry in 2023."
}

Jul 11 '24 22:07 vivianrwu

Something happened when trying to squash the commits, so I created another PR. The old one is here: https://github.com/google/maxtext/pull/763. Per discussion in that PR, we should keep the else False to prevent the model warmup logic from running regardless. @gobbleturk

Jul 11 '24 22:07 vivianrwu

Something happened when trying to squash the commits, so I created another PR. The old one is here: #763. Per discussion in that PR, we should keep the else False to prevent the model warmup logic from running regardless. @gobbleturk

Sure thats fine. I haven't seen anyone use blank/None for the configs yet, but I suppose this gives us some default behavior for that case...

Jul 11 '24 22:07 gobbleturk