Real-time-GesRec icon indicating copy to clipboard operation
Real-time-GesRec copied to clipboard

Test the model with RGB_D camera

Open YuhanPeng1219 opened this issue 5 years ago • 15 comments

Hi, thanks for sharing your amazing model.

I am now trying to test the model with my RGB-D camera. However, I am a beginner in pytorch. So, I need some help to go through the code:

  1. I plan to feed the model with depth images, which is achieved form the camera with openni and opencv. The shape of each frame is (112,112,3). If I want to detect and classify n frames in each iteration, what shape should the input be.

  2. What does "sample_duration" mean? What is the difference between "sample_duration_det" and detector queue.

I am using egogesture depth model.

YuhanPeng1219 avatar Jul 31 '19 03:07 YuhanPeng1219

Hi @YuhanPeng1219 ,

Please see my comment https://github.com/ahmetgunduz/Real-time-GesRec/issues/17#issuecomment-492430596, regarding how to feed a camera image to the models. It should give you an idea.

Besides please see my answers below:

I plan to feed the model with depth images, which is achieved form the camera with openni and opencv. The shape of each frame is (112,112,3). If I want to detect and classify n frames in each iteration, what shape should the input be. I see that you are somehow reading wrong channels. (112,112,3) image has to form of (height, width, channel) and your channel is 3. So it is either RGB or it is not properly casted to Depth image. What I would recommend is to use PIL package to format your image after reading with opencv. See modes in PIL for more information. https://pillow.readthedocs.io/en/5.1.x/handbook/concepts.html#modes And shape of a frame for depth model must be (1, 1, 1, 112, 112) torch.Tensor , which is annotated as (batch size, channel, sample duration, height, width). sample size is for the number of frames basically

What does "sample_duration" mean? What is the difference between "sample_duration_det" and detector queue.

It is the number of the frames in a video clip. *_det and *_clf are annotations for detector and classifier architecture, respectively.

Hope this makes sense, Ahmet

ahmetgunduz avatar Aug 01 '19 18:08 ahmetgunduz

Hi, @ahmetgunduz ,

Thanks for your reply. Actually, the frame shape is (112,112,3) because I want to use cv2.imshow() to show depth image. I will try PIL package then.

Also, when I feed the model with (1,1,1,112,112) torch.Tensor. The detector model runs without error. However, the classifier model gives me the error message: inputs_clf = torch.Size([1, 1, 1, 112, 112]) Traceback (most recent call last): File "video_test.py", line 337, in <module> outputs_clf = classifier(inputs_clf) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/media/yuhan/e71469ea-82d6-49ff-926e-dab1993ad0a2/Real-time-GesRec/models/resnext.py", line 169, in forward x = self.avgpool(x) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 636, in forward self.padding, self.ceil_mode, self.count_include_pad) RuntimeError: invalid argument 2: input image (T: 1 H: 4 W: 4) smaller than kernel size (kT: 2 kH: 4 kW: 4) at /pytorch/aten/src/THCUNN/generic/VolumetricAveragePooling.cu:57 Any suggestions?

YuhanPeng1219 avatar Aug 03 '19 13:08 YuhanPeng1219

It is because of the sample_duration I think. Can you share your bash file?

ahmetgunduz avatar Aug 04 '19 15:08 ahmetgunduz

The bash file is as follow: #!/bin/bash python3 video_test.py \ --root_path /media/yuhan/e71469ea-82d6-49ff-926e-dab1993ad0a2/Real-time-GesRec\ --video_path video_screen \ --annotation_path annotation_EgoGesture/egogestureall.json \ --resume_path_det models/egogesture_resnetl_10_Depth_8.pth \ --resume_path_clf models/egogesture_resnext_101_Depth_32.pth \ --result_path results \ --dataset egogesture \ --sample_duration_det 8 \ --sample_duration_clf 32 \ --model_det resnetl \ --model_clf resnext \ --model_depth_det 10 \ --model_depth_clf 101 \ --resnet_shortcut_det A \ --resnet_shortcut_clf B \ --batch_size 1 \ --n_classes_det 2 \ --n_finetune_classes_det 2 \ --n_classes_clf 83 \ --n_finetune_classes_clf 83 \ --n_threads 16 \ --checkpoint 1 \ --modality_det Depth \ --modality_clf Depth \ --n_val_samples 1 \ --train_crop random \ --test_subset test \ --det_strategy median \ --det_queue_size 4 \ --det_counter 2 \ --clf_strategy median \ --clf_queue_size 16 \ --clf_threshold_pre 0.6 \ --clf_threshold_final 0.15 \ --stride_len 2 \

But when using classifier model in "online_test.py", the sample_duration is not involved.

`if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:]

outputs_clf = classifier(inputs_clf) outputs_clf = F.softmax(outputs_clf,dim=1) outputs_clf = outputs_clf.cpu().numpy()[0].reshape(-1,)`

YuhanPeng1219 avatar Aug 05 '19 00:08 YuhanPeng1219

aha my bad! @YuhanPeng1219 (batch size, channel, sample duration, height, width) according to this annotation, your input must be (1,1,32,112,112) since you are using egogesture_resnext_101_Depth_{32}.pth {} stands for the sample duration of the input frames. And you need to configure this code snippet because by default inputs is in "RGB-D" format. `if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:]

and detector accept sample_duration as 8 in that pretrained model.

ahmetgunduz avatar Aug 05 '19 18:08 ahmetgunduz

@ahmetgunduz Thanks for your help again. Now the program can run without error. However, the results are always zero. Start Evaluation det_selected_queue = [0. 0.] pridiction_det = 0 prob_det = 0.0 inputs_clf = torch.Size([1, 1, 32, 112, 112]) clf result [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] img_size = (112, 112, 1) Have you guys seen this before? I have used PIL to format my input images. The mode I choose is 'L'.

YuhanPeng1219 avatar Aug 07 '19 02:08 YuhanPeng1219

I think it is because the prediction_det=0 always. If it is not 1, classifier model will be always 0

ahmetgunduz avatar Aug 07 '19 10:08 ahmetgunduz

@ahmetgunduz Thanks a lot for sharing your models. I also met a trouble during testing the model with RGB_D camera. Here i list my steps. Is there any problems.

  1. I set the modality as 'depth'. and i get data from depth camera, and resize it as (112,112)
  2. I put the data of 32 frames into a list called A . As the shape of this list is (32,112,112)
  3. I reshape this list A to (1,1,32,112,112), and call this as total_frames. total_frames = np.reshape(total_sample, (1, 1, sample_duration_clf, 112, 112)) 4.I took the last eight pieces of data and put them into the detector. inputs_det = torch.from_numpy(total_frames).float()[:,:,-sample_duration_det:,:,:,] outputs_det = detector(inputs_det)
  4. Finally, the raw output of the detector is wrong, and the result after softmax is 1 all the time. Part of the results are as follows : outputs_det: tensor([[-71.0893, 82.3528]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-74.1505, 80.2766]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-72.9503, 80.4595]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-75.3347, 79.0474]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-76.2401, 78.2442]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-72.4066, 73.7980]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-73.1670, 74.1967]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-73.1590, 77.5848]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-79.0164, 74.5970]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-78.1467, 78.0265]], grad_fn=<AddmmBackward>) prob_det: 1.0

I think there maybe some wrong with the data i feed into the detector, but i can't figure it out. Any suggestion will help me a lot . Thanks again for sharing.

likestudy avatar Nov 05 '19 14:11 likestudy

Hi @likestudy , did you cropped and normalized the input images using spation_transformation defined here https://github.com/ahmetgunduz/Real-time-GesRec/blob/cec7050b3cea2d3ec33967a1552d2022b89d4166/online_test.py#L147-L151 ? Other than that everything looks fine. In the online_test.py, I fed this transformation to the dataset and you can apply this transformation to your frames as it is done here https://github.com/ahmetgunduz/Real-time-GesRec/blob/cec7050b3cea2d3ec33967a1552d2022b89d4166/datasets/egogesture_online.py#L231-L232

ahmetgunduz avatar Nov 06 '19 14:11 ahmetgunduz

@ahmetgunduz Thank you for your help. I tried your method, but the results still have not changed. I found that the background of the depth image in the dataset is black(Picture above). But in the depth image I collected with the xiton camera(Picture below), the hand in front is black, and the background is not pure gray. depth m1

I think there may be a problem with the depth image I am using. It seems that the images in the dataset are processed. But I can't find a way to turn the picture into the correct form.

likestudy avatar Nov 07 '19 08:11 likestudy

Hi @likestudy ,have you know about how to process the images?

niuwenju avatar May 27 '20 08:05 niuwenju

aha my bad! @YuhanPeng1219 (batch size, channel, sample duration, height, width) according to this annotation, your input must be (1,1,32,112,112) since you are using egogesture_resnext_101_Depth_{32}.pth {} stands for the sample duration of the input frames. And you need to configure this code snippet because by default inputs is in "RGB-D" format. `if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:]

and detector accept sample_duration as 8 in that pretrained model.

Hello I have a question, Why you add the code

inputs_clf = torch.Tensor(inputs_clf.numpy()[:,:,::2,:,:])

in online_test.py,which will cause the error :RuntimeError: invalid argument 2: input image (T: 1 H: 4 W: 4) smaller than kernel size (kT: 2 kH: 4 kW: 4)

kinfeparty avatar Jun 21 '20 14:06 kinfeparty

aha my bad! @YuhanPeng1219 (batch size, channel, sample duration, height, width) according to this annotation, your input must be (1,1,32,112,112) since you are using egogesture_resnext_101_Depth_{32}.pth {} stands for the sample duration of the input frames. And you need to configure this code snippet because by default inputs is in "RGB-D" format. if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:] and detector accept sample_duration` as 8 in that pretrained model.

Hello I have a question, Why you add the code

inputs_clf = torch.Tensor(inputs_clf.numpy()[:,:,::2,:,:])

in online_test.py,which will cause the error :RuntimeError: invalid argument 2: input image (T: 1 H: 4 W: 4) smaller than kernel size (kT: 2 kH: 4 kW: 4)

Correct, this line cause the error when running online_test.py. The error gone once I comment out that line.

cjtang avatar Oct 28 '22 23:10 cjtang

@ahmetgunduz Thank you for your help. I tried your method, but the results still have not changed. @ahmetgunduz 感谢您的帮助。我尝试了您的方法,但结果仍然没有改变。 I found that the background of the depth image in the dataset is black(Picture above). But in the depth image I collected with the xiton camera(Picture below), the hand in front is black, and the background is not pure gray.我发现数据集中深度图像的背景是黑色的(上图)。但是在我用xiton相机收集的深度图像中(下图),前面的手是黑色的,背景不是纯灰色的。 depth m1

I think there may be a problem with the depth image I am using. It seems that the images in the dataset are processed. But I can't find a way to turn the picture into the correct form.我认为我使用的深度图像可能有问题。似乎数据集中的图像已处理。但是我找不到将图片变成正确形式的方法。

hello,你最后解决这个问题了吗?

Oriseoss avatar Mar 31 '23 02:03 Oriseoss

@ahmetgunduz Thank you for your help. I tried your method, but the results still have not changed. I found that the background of the depth image in the dataset is black(Picture above). But in the depth image I collected with the xiton camera(Picture below), the hand in front is black, and the background is not pure gray. depth m1

I think there may be a problem with the depth image I am using. It seems that the images in the dataset are processed. But I can't find a way to turn the picture into the correct form.

你好 最后解决这个问题了吗

qyh-stbz avatar Apr 08 '24 15:04 qyh-stbz