jetson-inference icon indicating copy to clipboard operation
jetson-inference copied to clipboard

Trying different models but getting the same output when inferencing

Open matialmar opened this issue 2 years ago • 4 comments

I modified the example of imageNet to run a few models i have made on Caffe. I'm running on a Jetson TX2 with Jetpack 4.6.1 and the corresponding branch of jetson-inference, L4T-R32.7.1. I get the model to load and optimize in FP16. But.. when i try to make an inference with different input data the output is the exact same (as if the input is a constant value). Of course this output is different for each model (so the models are running?). I tried making my own loader using openCV and CUDA as mentioned on #129 and got the same result.

The model code is basically the same as imageNet

The detection script (as my-recognition.cpp example)

int main( int argc, char** argv )
{

	// retrieve the image filename from the array of command line args
	const char * imgFilename = argv[3];	
	const char * prototxt_path = argv[1];
	const char * snapshot_path = argv[2];

	videoOutput* output = videoOutput::Create("./output.jpg");
	 
	// these variables will store the image data pointer and dimensions
	uchar3* imgPtr = NULL;   // shared CPU/GPU pointer to image
	int imgWidth   = 0;      // width of the image (in pixels)
	int imgHeight  = 0;      // height of the image (in pixels)
	
	// load the image from disk as float3 RGB (24 bits per pixel)
	if( !loadImage(imgFilename, &imgPtr, &imgWidth, &imgHeight) )
	{
		printf("failed to load image '%s'\n", imgFilename);
		return 0;
	}
	

	Net* net = Net::Create(prototxt_path, snapshot_path,NULL, "data", "softmax", 1U, TYPE_FP16);

	// check to make sure that the network model loaded properly
	if( !net )
	{
		printf("failed to load image recognition network\n");
		return 0;
	}

	// this variable will store the confidence of the classification (between 0 and 1)
	float confidence = 0.0;

	// classify the image, return the object class index (or -1 on error)
	const int classIndex = net->Classify(imgPtr, imgWidth, imgHeight, &confidence);
	printf("Class predicted: %d %s\n", classIndex, net->classDetector(classIndex));
	// free the network's resources before shutting down

	if(output != NULL)
		output->Render(imgPtr, imgWidth, imgHeight); 

	delete net;

	// this is the end of the example!
	return 0;
}

The custom image loader (i tried with uchar3 also, no errors from CUDA)

bool loadImageOpenCV(const char * path, float3** ptr, int* width, int* height)
{
	const std::string image_path = locateFile(path);

	if( image_path.length() == 0)
	{
		LogError("Input not found\n");
		return false;
	}
	
	cv::Mat image = cv::imread(image_path, cv::IMREAD_COLOR);
	
	if(image.empty())
	{
		LogError("Could not read image\n");
		return false;
	}
	
	cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
	image.convertTo(image, CV_32FC3,1/255.0);
	
	*width = image.cols;
	*height = image.rows;

	size_t imsize = (size_t)((*width) * (*height) * sizeof(float3));
	size_t dpitch = (size_t)((*width) * sizeof(float3));

	cudaError ret = cudaMalloc((void **)ptr, imsize);
	if( ret != cudaError::cudaSuccess)
	{
		LogError(CUDA_ERROR "Error allocating memory %d\n", ret);
		return false;
	}
	
	ret = cudaMemcpy2D((void *)*ptr, dpitch, (void *) image.data, dpitch, *width, *height, 	cudaMemcpyHostToDevice);
	if( ret != cudaError::cudaSuccess)
	{
		LogError(CUDA_ERROR "Error copying image to CUDA memory %d\n", ret);
		return false;
	}

	return true;
}

matialmar avatar May 12 '22 17:05 matialmar

An output example

[image]  loaded '1.jpg'  (64x64, 3 channels)
[TRT]    TensorRT version 8.2.1
[TRT]    loading NVIDIA plugins...
[TRT]    Registered plugin creator - ::GridAnchor_TRT version 1
[TRT]    Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT]    Registered plugin creator - ::NMS_TRT version 1
[TRT]    Registered plugin creator - ::Reorg_TRT version 1
[TRT]    Registered plugin creator - ::Region_TRT version 1
[TRT]    Registered plugin creator - ::Clip_TRT version 1
[TRT]    Registered plugin creator - ::LReLU_TRT version 1
[TRT]    Registered plugin creator - ::PriorBox_TRT version 1
[TRT]    Registered plugin creator - ::Normalize_TRT version 1
[TRT]    Registered plugin creator - ::ScatterND version 1
[TRT]    Registered plugin creator - ::RPROI_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
[TRT]    Registered plugin creator - ::CropAndResize version 1
[TRT]    Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_TFTRT_TRT version 1
[TRT]    Registered plugin creator - ::Proposal version 1
[TRT]    Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT]    Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT]    Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT]    Registered plugin creator - ::Split version 1
[TRT]    Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT]    detected model format - caffe  (extension '.caffemodel')
[TRT]    desired precision specified for GPU: FP16
[TRT]    [MemUsageChange] Init CUDA: CPU +261, GPU +0, now: CPU 288, GPU 6846 (MiB)
[TRT]    [MemUsageSnapshot] Begin constructing builder kernel library: CPU 288 MiB, GPU 6845 MiB
[TRT]    [MemUsageSnapshot] End constructing builder kernel library: CPU 318 MiB, GPU 6875 MiB
[TRT]    native precisions detected for GPU:  FP32, FP16
[TRT]    attempting to open engine cache file ./model16/snapshot.caffemodel.1.1.8201.GPU.FP16.engine
[TRT]    loading network plan from engine cache... ./model16/snapshot.caffemodel.1.1.8201.GPU.FP16.engine
[TRT]    device GPU, loaded ./model16/snapshot.caffemodel
[TRT]    [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 289, GPU 6875 (MiB)
[TRT]    Loaded engine size: 0 MiB
[TRT]    Using cublas as a tactic source
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +141, now: CPU 456, GPU 7016 (MiB)
[TRT]    Using cuDNN as a tactic source
[TRT]    [MemUsageChange] Init cuDNN: CPU +250, GPU +246, now: CPU 706, GPU 7262 (MiB)
[TRT]    Deserialization required 1875580 microseconds.
[TRT]    [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[TRT]    Using cublas as a tactic source
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 706, GPU 7262 (MiB)
[TRT]    Using cuDNN as a tactic source
[TRT]    [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 706, GPU 7262 (MiB)
[TRT]    Total per-runner device persistent memory is 43008
[TRT]    Total per-runner host persistent memory is 10640
[TRT]    Allocated activation device memory of size 366080
[TRT]    [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[TRT]    
[TRT]    CUDA engine context initialized on device GPU:

[TRT]       binding 0
                -- index   0
                -- name    'data'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  3
                -- dim #0  3
                -- dim #1  64
                -- dim #2  64
[TRT]       binding 1
                -- index   1
                -- name    'softmax'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  4
                -- dim #1  1
                -- dim #2  1
[TRT]    
[TRT]    binding to input 0 data  binding index:  0
[TRT]    binding to input 0 data  dims (b=1 c=3 h=64 w=64) size=49152
[TRT]    binding to output 0 softmax  binding index:  1
[TRT]    binding to output 0 softmax  dims (b=1 c=4 h=1 w=1) size=16
[TRT]    
[TRT]    device GPU, ./model16/snapshot.caffemodel initialized.
class 0: 0.013490
class 1: 0.017221
class 2: 0.307926
class 3: 0.66136
Class predicted: 3

For this model the predicted class values will be the same no matter the input image

matialmar avatar May 12 '22 17:05 matialmar

For this model the predicted class values will be the same no matter the input image

Hi @matialmar, are you sure the pre-processing is what your model expects? https://github.com/dusty-nv/jetson-inference/blob/6cf3b12f503d64903ca1e77d4c5474d9b1513a4c/c/imageNet.cpp#L447

dusty-nv avatar May 12 '22 18:05 dusty-nv

Hey @dusty-nv, there is no preprocessing needed. The expected input is an 24 bit RGB image.

matialmar avatar May 12 '22 18:05 matialmar

@matialmar then you should change the code to reflect that and recompile/reinstall it

dusty-nv avatar May 13 '22 13:05 dusty-nv