TensorRT-LLM
TensorRT-LLM copied to clipboard
Memory type of sampling params
This document describes tensor datatypes for GptManager InferenceRequest
My question is: what kind of memory is needed for these tensors? Pinned/Pagable/Device? I can't find information about this.