docker 0.29.0-pytorch-inf2 with meta-llama/Meta-Llama-3.1-8B-Instructn failes
Description
Unable to use open-ai endpoint, getting the error below.
Error Message
PyProcess W-100-model-stdout: The following parameters are not supported by neuron with rolling batch: {'frequency_penalty'}.
How to Reproduce?
Using Docker.
"image": "deepjavalibrary/djl-serving:0.29.0-pytorch-inf2"
"envVars": "AWS_NEURON_VISIBLE_DEVICES=ALL
OPTION_TENSOR_PARALLEL_DEGREE=max
HF_HOME=/tmp/.cache/huggingface
OPTION_MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
OPTION_ENTRYPOINT=djl_python.transformers_neuronx
OPTION_TRUST_REMOTE_CODE=true
SERVING_LOAD_MODELS=test::Python=/opt/ml/model
OPTION_ROLLING_BATCH=auto
OPTION_ENABLE_CHUNKED_PREFILL=true
OPTION_MAX_ROLLING_BATCH_SIZE=32
OPTION_N_POSITIONS=8192
OPTION_MAX_BATCH_DELAY=500
DJL_CACHE_DIR=/tmp/.cache/ ",
PyProcess W-100-model-stdout: The following parameters are not supported by neuron with rolling batch: {'frequency_penalty'}. This is just a warning. We would not fail because of this.
Do you have any other error messages in the log?
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.