djl-serving icon indicating copy to clipboard operation
djl-serving copied to clipboard

Error running google/gemma-7b-it

Open yaronr opened this issue 1 month ago • 0 comments

Description

Running djl-inference:0.27.0-neuronx-sdk2.18.1, with Huggingface model google/gemma-7b-it fails.

Error Message

WARN PyProcess W-93-model-stderr: --- Logging error --- WARN PyProcess W-93-model-stderr: Traceback (most recent call last): WARN PyProcess W-93-model-stderr: File "/tmp/.djl.ai/python/0.27.0/djl_python_engine.py", line 121, in run_server WARN PyProcess W-93-model-stderr: outputs = self.service.invoke_handler(function_name, inputs) WARN PyProcess W-93-model-stderr: File "/tmp/.djl.ai/python/0.27.0/djl_python/service_loader.py", line 29, in invoke_handler WARN PyProcess W-93-model-stderr: return getattr(self.module, function_name)(inputs) WARN PyProcess W-93-model-stderr: File "/tmp/.djl.ai/python/0.27.0/djl_python/transformers_neuronx.py", line 274, in handle WARN PyProcess W-93-model-stderr: _service.initialize(inputs.get_properties()) WARN PyProcess W-93-model-stderr: File "/tmp/.djl.ai/python/0.27.0/djl_python/transformers_neuronx.py", line 124, in initialize WARN PyProcess W-93-model-stderr: self.set_configs(properties) WARN PyProcess W-93-model-stderr: File "/tmp/.djl.ai/python/0.27.0/djl_python/transformers_neuronx.py", line 85, in set_configs WARN PyProcess W-93-model-stderr: self.model_config = AutoConfig.from_pretrained( WARN PyProcess W-93-model-stderr: File "/usr/local/lib/python3.9/dist-packages/transformers/models/auto/configuration_auto.py", line 1098, in from_pretrained WARN PyProcess W-93-model-stderr: config_class = CONFIG_MAPPING[config_dict["model_type"]] WARN PyProcess W-93-model-stderr: File "/usr/local/lib/python3.9/dist-packages/transformers/models/auto/configuration_auto.py", line 795, in getitem WARN PyProcess W-93-model-stderr: raise KeyError(key) WARN PyProcess W-93-model-stderr: KeyError: 'gemma'

How to Reproduce?

docker run -e HUGGING_FACE_HUB_TOKEN=xxx -e OPTION_ENTRYPOINT=djl_python.transformers_neuronx -e OPTION_TRUST_REMOTE_CODE=true -e SERVING_LOAD_MODELS=test::Python=/opt/ml/model -e OPTION_TENSOR_PARALLEL_DEGREE=2 -e OPTION_MODEL_ID=google/gemma-7b-it --device /dev/neuron0 -d --name djl-gemma -p 8080:8080 763104351884.dkr.ecr.$AWS_REGION.amazonaws.com/djl-inference:0.27.0-neuronx-sdk2.18.1

yaronr avatar Apr 30 '24 09:04 yaronr