trt-llm-rag-windows icon indicating copy to clipboard operation
trt-llm-rag-windows copied to clipboard

Incredibly unclear instructions

Open BasedAnon opened this issue 1 year ago • 3 comments

Was this posted with the intent of people actually using it? Which files are the tokenizer and where do I put them? Where is the .engine file? Has anyone actually gotten this to work, or is it fake?

BasedAnon avatar Jan 11 '24 23:01 BasedAnon

I used the official installer and now it can't find the llama model 😃 My RTX 3060 GPU is running at 100% for 20 minutes and I don't know on earth what it's doing. My computer is whirring out hot wind (79℃) and it works even better than my air conditioner :)

Jason-XII avatar Feb 16 '24 01:02 Jason-XII

The tokenizer file is from here: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf You can put it on anywhere you like and change to the path what you put. You need to run build.py to build the .engine file (from the other repo, you can run these command below to find it on example/llama/build.py)

git clone --branch rel https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive

noahc1510 avatar Feb 18 '24 19:02 noahc1510

I'm with you, I can't figure out how to extract the repy to feed it into another input or even a text file somewhere. Ideally I'd like to have my data and ask Chat with RTX questions on my data and then feed it somewhere else.

KnowhereFern avatar Feb 20 '24 14:02 KnowhereFern

Please use the updated instructions : https://github.com/NVIDIA/ChatRTX/blob/release/0.3/README.md

anujj avatar May 23 '24 09:05 anujj