trt-llm-rag-windows Incredibly unclear instructions

Incredibly unclear instructions

Open BasedAnon opened this issue 1 year ago • 3 comments

Was this posted with the intent of people actually using it? Which files are the tokenizer and where do I put them? Where is the .engine file? Has anyone actually gotten this to work, or is it fake?

Jan 11 '24 23:01 BasedAnon

I used the official installer and now it can't find the llama model 😃 My RTX 3060 GPU is running at 100% for 20 minutes and I don't know on earth what it's doing. My computer is whirring out hot wind (79℃) and it works even better than my air conditioner :)

Feb 16 '24 01:02 Jason-XII

The tokenizer file is from here: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf You can put it on anywhere you like and change to the path what you put. You need to run build.py to build the .engine file (from the other repo, you can run these command below to find it on example/llama/build.py)

git clone --branch rel https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive

Feb 18 '24 19:02 noahc1510

I'm with you, I can't figure out how to extract the repy to feed it into another input or even a text file somewhere. Ideally I'd like to have my data and ask Chat with RTX questions on my data and then feed it somewhere else.

Feb 20 '24 14:02 KnowhereFern

Please use the updated instructions : https://github.com/NVIDIA/ChatRTX/blob/release/0.3/README.md

May 23 '24 09:05 anujj

trt-llm-rag-windows trt-llm-rag-windows copied to clipboard

Incredibly unclear instructions

trt-llm-rag-windows
trt-llm-rag-windows copied to clipboard