onnx-tensorrt
onnx-tensorrt copied to clipboard
How can I speed up argmax via Tensorrt?
I use tensorrt to speed up my model, but the result(a numpy array) from tensorrt is too slow when performing argmax on the CPU, so I want to use tensorrt or GPU to accelerate argmax(). At present, I tried to convert the result to tensor, and then use tensor.cuda().argmax(), but the following error " "../rtSafe/safeContext.cpp - cudnn Error in configure: 7(CUDNN_STATUS_MAPPING_ERROR)""occurred. How should I solve this problem?
Can you provide a code snippet of your existing TensorRT workflow and how you are adding the argmax operator?
Any update on this issue? ArgMax from ONNX would be conveted to TopK in TensorRT and TopK for segmentation output is too slow