trt-llm-rag-windows
trt-llm-rag-windows copied to clipboard
Incredibly unclear instructions
Was this posted with the intent of people actually using it? Which files are the tokenizer and where do I put them? Where is the .engine file? Has anyone actually gotten this to work, or is it fake?
I used the official installer and now it can't find the llama model 😃 My RTX 3060 GPU is running at 100% for 20 minutes and I don't know on earth what it's doing. My computer is whirring out hot wind (79℃) and it works even better than my air conditioner :)
The tokenizer file is from here: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf
You can put it on anywhere you like and change
git clone --branch rel https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
I'm with you, I can't figure out how to extract the repy to feed it into another input or even a text file somewhere. Ideally I'd like to have my data and ask Chat with RTX questions on my data and then feed it somewhere else.
Please use the updated instructions : https://github.com/NVIDIA/ChatRTX/blob/release/0.3/README.md