djl-serving DeepSpeed streaming, max

DeepSpeed streaming, max_length is ignored

Open frankfliu opened this issue 1 year ago • 2 comments

serving.properties:

option.model_id=EleutherAI/gpt-neo-1.3B
option.task=text-generation
option.tensor_parallel_degree=2
option.dtype=fp16
option.enable_streaming=true
#option.enable_streaming=huggingface
engine=DeepSpeed
option.parallel_loading=true

curl command:

curl -X POST "http://localhost:8080/invocations" \
     -H "content-type: application/json" \
     -d '{"inputs": ["Large language model is"], "parameters": {"max_length" :2}}'

Expected to return 2 new tokens, but 50 tokens are returned

Jun 10 '23 19:06 frankfliu

This is not a valid inputs, given max_length < input_token size. You may want to use a value larger than input token length, or use max_new_tokens instead to avoid input token size limitation

Jun 12 '23 16:06 lanking520

tried max_length = 25, still the same. We need standard the parameters as much as possible.

Jun 12 '23 16:06 frankfliu

djl-serving djl-serving copied to clipboard

DeepSpeed streaming, max_length is ignored

djl-serving
djl-serving copied to clipboard