GenerativeAIExamples icon indicating copy to clipboard operation
GenerativeAIExamples copied to clipboard

langchain_nvidia_trt not working

Open rbgo404 opened this issue 1 year ago • 3 comments

I have gone through the notebooks but couldn't able to stream the tokens from the TensorRTLLM. Here's the issue: image

Code used:

from langchain_nvidia_trt.llms import TritonTensorRTLLM
import time
import random

triton_url = "localhost:8001"
pload = {
            'tokens':300,
            'server_url': triton_url,
            'model_name': "ensemble",
            'temperature':1.0,
            'top_k':1,
            'top_p':0,
            'beam_width':1,
            'repetition_penalty':1.0,
            'length_penalty':1.0
}
client = TritonTensorRTLLM(**pload)

LLAMA_PROMPT_TEMPLATE = (
 "<s>[INST] <<SYS>>"
 "{system_prompt}"
 "<</SYS>>"
 "[/INST] {context} </s><s>[INST] {question} [/INST]"
)
system_prompt = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Please ensure that your responses are positive in nature."
context=""
question='What is the fastest land animal?'
prompt = LLAMA_PROMPT_TEMPLATE.format(system_prompt=system_prompt, context=context, question=question)

start_time = time.time()
tokens_generated = 0

for val in client._stream(prompt):
    tokens_generated += 1
    print(val, end="", flush=True)

total_time = time.time() - start_time
print(f"\n--- Generated {tokens_generated} tokens in {total_time} seconds ---")
print(f"--- {tokens_generated/total_time} tokens/sec")

rbgo404 avatar Apr 19 '24 10:04 rbgo404

Please share the configuration in the TensorRT-LLM end. What are the parameters modification required in the model's config.pbtxt

rbgo404 avatar Apr 19 '24 10:04 rbgo404

Hey @rbgo404 You can deploy the tensorRT-based LLM model by following the steps here https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html#using-local-gpus-for-a-q-a-chatbot

This notebook interacts with the model deployed behind llm-inference-server container which should get started up if you follow the steps above.

Let me know if you have any questions once you go through these steps!

shubhadeepd avatar Apr 22 '24 13:04 shubhadeepd

Hi, I followed the instruction but still has problem starting llm-inference-server. I'm currently using Tesla M60 and llama-2-13b-chat Screenshot from 2024-04-30 23-08-17

ChiBerkeley avatar May 01 '24 06:05 ChiBerkeley