tensorrt_inference
tensorrt_inference copied to clipboard
Using cudaMemcpyAsync directly rather than context->enqueueV2
Many thanks for this great this repo! It's an amazing and useful work, that I am learning from quite a bit.
I wanted to check if there's an optimization reason why you chose to call cudaMemcpyAsync
directly in mode.cpp rather than context->enqueueV2
as written in the documentation .
I am still relatively new to deploying models in C++. Was your choice an optimization choice? Or a just a personal coding style?
If I understand correctly, enqueueV2
is just a wrapper around the cuda memcpy, and so wouldn't using the enqueueV2
or executeV2
be more maintainable long term, since as the tensor-rt
changes they could potentially change the implementation but keep that method same signature.