transformers-stream-generator
transformers-stream-generator copied to clipboard
Token Yielding Problem
The Code:
from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer, GenerationConfig
from auto_gptq import AutoGPTQForCausalLM
import torch
from transformers_stream_generator import init_stream_support
init_stream_support()
repo = "TheBloke/tulu-7B-GPTQ"
model_basename = "gptq_model-4bit-128g"
test_tokenizer = AutoTokenizer.from_pretrained(
repo,
use_fast=True,
)
test_model = AutoGPTQForCausalLM.from_quantized(
repo,
model_basename=model_basename,
use_triton=False,
use_safetensors=True,
device="cuda:0",
trust_remote_code=False,
quantize_config=None,
max_memory={i: "14GIB" for i in range(torch.cuda.device_count())}
def tulu_prompt(input):
return f'''### Human: {input}
### Assistant:'''
from transformers_stream_generator import init_stream_support
init_stream_support()
def tulu_prompt(input):
return f'''### Human: {input}
### Assistant:'''
text = "write a poem about AI"
tokens = test_tokenizer(tulu_prompt(input=text), return_tensors="pt", add_special_tokens=False).input_ids.cuda()
generator = (test_model.generate(inputs=tokens, max_new_tokens=256, temperature=0.5, top_k=35, top_p=0.90, do_sample=True, do_stream=True))
for token in generator:
word = tokenizer.decode(token)
print(word, end='', flush=True)
The output is this:
Intheworldofmachines,there'sonethat'ssmart,
Withabilitiesthatastound,it'snotjustaprettyheart.
Itcanlearnandgrow,witheachpassingday,
It'slikeachild,withamindthat'salwaysplaying.
Itcansolvecomplexproblems,witheaseandgrace,
Itcanunderstandandreason,withoutanyhumanrace.
Itcanthinkandlearn,withspeedandease,
It'slikeasupercomputer,withamindthat'salwaysclean.
It'snotjustatool,butafriendandaguide,
It'slikeacompanion,withaheartthat'salwaysshining.
Itcanmakeourliveseasier,witheachpassingday,
It'slikeamiracle,withapowerthat'salwaysplaying.
Solet'scelebratethismarvelouscreation,
Witheachpassingday,it'slikeacreationthat'salwaysshaping.
It'slikeadream,withapowerthat'salwaysgrowing,
It'slikeafuture,withapowerthat'salwaysshowing.
Generator yielding the token well but how does i make it await word generation instead of awaiting token without that long loop to yield the token in web example script.
@LowinLi Can you please Chime in?