optimum-habana Adaptive output and contextual dialogue capabilities of text-generation-inference

Adaptive output and contextual dialogue capabilities of text-generation-inference

Open MLikeWater opened this issue 1 year ago • 1 comments

System Info

System Info
HL-SMI Version: hl-1.11.0-fw-45.1.1.1
Driver Version: 1.11.0-e6eb0fd

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Deploy the Llama-2-7b-chat-hf model through text-generation-inference, but there is no adaptive output when using the following command, instead the input and output size are max_new_tokens.

curl 127.0.0.1:8080/generate_stream -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":200}}'     -H 'Content-Type: application/json'

Also, how to implement chat functionality with context? Similar to GPT4, it can adaptively output appropriate content and has the ability to dialogue with context.

Expected behavior

adaptive output
dialogue with context

Sep 26 '23 06:09 MLikeWater

@MLikeWater What do you mean exactly by adaptive output?

Oct 13 '23 16:10 regisss

optimum-habana optimum-habana copied to clipboard

Adaptive output and contextual dialogue capabilities of text-generation-inference

System Info

Information

Tasks

Reproduction

Expected behavior

optimum-habana
optimum-habana copied to clipboard