jetson-inference
jetson-inference copied to clipboard
Trying different models but getting the same output when inferencing
I modified the example of imageNet to run a few models i have made on Caffe. I'm running on a Jetson TX2 with Jetpack 4.6.1 and the corresponding branch of jetson-inference, L4T-R32.7.1. I get the model to load and optimize in FP16. But.. when i try to make an inference with different input data the output is the exact same (as if the input is a constant value). Of course this output is different for each model (so the models are running?). I tried making my own loader using openCV and CUDA as mentioned on #129 and got the same result.
The model code is basically the same as imageNet
The detection script (as my-recognition.cpp example)
int main( int argc, char** argv )
{
// retrieve the image filename from the array of command line args
const char * imgFilename = argv[3];
const char * prototxt_path = argv[1];
const char * snapshot_path = argv[2];
videoOutput* output = videoOutput::Create("./output.jpg");
// these variables will store the image data pointer and dimensions
uchar3* imgPtr = NULL; // shared CPU/GPU pointer to image
int imgWidth = 0; // width of the image (in pixels)
int imgHeight = 0; // height of the image (in pixels)
// load the image from disk as float3 RGB (24 bits per pixel)
if( !loadImage(imgFilename, &imgPtr, &imgWidth, &imgHeight) )
{
printf("failed to load image '%s'\n", imgFilename);
return 0;
}
Net* net = Net::Create(prototxt_path, snapshot_path,NULL, "data", "softmax", 1U, TYPE_FP16);
// check to make sure that the network model loaded properly
if( !net )
{
printf("failed to load image recognition network\n");
return 0;
}
// this variable will store the confidence of the classification (between 0 and 1)
float confidence = 0.0;
// classify the image, return the object class index (or -1 on error)
const int classIndex = net->Classify(imgPtr, imgWidth, imgHeight, &confidence);
printf("Class predicted: %d %s\n", classIndex, net->classDetector(classIndex));
// free the network's resources before shutting down
if(output != NULL)
output->Render(imgPtr, imgWidth, imgHeight);
delete net;
// this is the end of the example!
return 0;
}
The custom image loader (i tried with uchar3 also, no errors from CUDA)
bool loadImageOpenCV(const char * path, float3** ptr, int* width, int* height)
{
const std::string image_path = locateFile(path);
if( image_path.length() == 0)
{
LogError("Input not found\n");
return false;
}
cv::Mat image = cv::imread(image_path, cv::IMREAD_COLOR);
if(image.empty())
{
LogError("Could not read image\n");
return false;
}
cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
image.convertTo(image, CV_32FC3,1/255.0);
*width = image.cols;
*height = image.rows;
size_t imsize = (size_t)((*width) * (*height) * sizeof(float3));
size_t dpitch = (size_t)((*width) * sizeof(float3));
cudaError ret = cudaMalloc((void **)ptr, imsize);
if( ret != cudaError::cudaSuccess)
{
LogError(CUDA_ERROR "Error allocating memory %d\n", ret);
return false;
}
ret = cudaMemcpy2D((void *)*ptr, dpitch, (void *) image.data, dpitch, *width, *height, cudaMemcpyHostToDevice);
if( ret != cudaError::cudaSuccess)
{
LogError(CUDA_ERROR "Error copying image to CUDA memory %d\n", ret);
return false;
}
return true;
}
An output example
[image] loaded '1.jpg' (64x64, 3 channels)
[TRT] TensorRT version 8.2.1
[TRT] loading NVIDIA plugins...
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TFTRT_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] detected model format - caffe (extension '.caffemodel')
[TRT] desired precision specified for GPU: FP16
[TRT] [MemUsageChange] Init CUDA: CPU +261, GPU +0, now: CPU 288, GPU 6846 (MiB)
[TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 288 MiB, GPU 6845 MiB
[TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 318 MiB, GPU 6875 MiB
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] attempting to open engine cache file ./model16/snapshot.caffemodel.1.1.8201.GPU.FP16.engine
[TRT] loading network plan from engine cache... ./model16/snapshot.caffemodel.1.1.8201.GPU.FP16.engine
[TRT] device GPU, loaded ./model16/snapshot.caffemodel
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 289, GPU 6875 (MiB)
[TRT] Loaded engine size: 0 MiB
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +141, now: CPU 456, GPU 7016 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +250, GPU +246, now: CPU 706, GPU 7262 (MiB)
[TRT] Deserialization required 1875580 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 706, GPU 7262 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 706, GPU 7262 (MiB)
[TRT] Total per-runner device persistent memory is 43008
[TRT] Total per-runner host persistent memory is 10640
[TRT] Allocated activation device memory of size 366080
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] binding 0
-- index 0
-- name 'data'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3
-- dim #1 64
-- dim #2 64
[TRT] binding 1
-- index 1
-- name 'softmax'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 4
-- dim #1 1
-- dim #2 1
[TRT]
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=64 w=64) size=49152
[TRT] binding to output 0 softmax binding index: 1
[TRT] binding to output 0 softmax dims (b=1 c=4 h=1 w=1) size=16
[TRT]
[TRT] device GPU, ./model16/snapshot.caffemodel initialized.
class 0: 0.013490
class 1: 0.017221
class 2: 0.307926
class 3: 0.66136
Class predicted: 3
For this model the predicted class values will be the same no matter the input image
For this model the predicted class values will be the same no matter the input image
Hi @matialmar, are you sure the pre-processing is what your model expects? https://github.com/dusty-nv/jetson-inference/blob/6cf3b12f503d64903ca1e77d4c5474d9b1513a4c/c/imageNet.cpp#L447
Hey @dusty-nv, there is no preprocessing needed. The expected input is an 24 bit RGB image.
@matialmar then you should change the code to reflect that and recompile/reinstall it