optimum-habana icon indicating copy to clipboard operation
optimum-habana copied to clipboard

Adaptive output and contextual dialogue capabilities of text-generation-inference

Open MLikeWater opened this issue 1 year ago • 1 comments

System Info

System Info
HL-SMI Version: hl-1.11.0-fw-45.1.1.1
Driver Version: 1.11.0-e6eb0fd

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

Deploy the Llama-2-7b-chat-hf model through text-generation-inference, but there is no adaptive output when using the following command, instead the input and output size are max_new_tokens.

curl 127.0.0.1:8080/generate_stream -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":200}}'     -H 'Content-Type: application/json'

Also, how to implement chat functionality with context? Similar to GPT4, it can adaptively output appropriate content and has the ability to dialogue with context.

Expected behavior

  1. adaptive output
  2. dialogue with context

MLikeWater avatar Sep 26 '23 06:09 MLikeWater

@MLikeWater What do you mean exactly by adaptive output?

regisss avatar Oct 13 '23 16:10 regisss