Real-time-GesRec
Real-time-GesRec copied to clipboard
Test the model with RGB_D camera
Hi, thanks for sharing your amazing model.
I am now trying to test the model with my RGB-D camera. However, I am a beginner in pytorch. So, I need some help to go through the code:
-
I plan to feed the model with depth images, which is achieved form the camera with openni and opencv. The shape of each frame is (112,112,3). If I want to detect and classify n frames in each iteration, what shape should the input be.
-
What does "sample_duration" mean? What is the difference between "sample_duration_det" and detector queue.
I am using egogesture depth model.
Hi @YuhanPeng1219 ,
Please see my comment https://github.com/ahmetgunduz/Real-time-GesRec/issues/17#issuecomment-492430596, regarding how to feed a camera image to the models. It should give you an idea.
Besides please see my answers below:
I plan to feed the model with depth images, which is achieved form the camera with openni and opencv. The shape of each frame is (112,112,3). If I want to detect and classify n frames in each iteration, what shape should the input be. I see that you are somehow reading wrong channels. (112,112,3) image has to form of (
height
,width
,channel
) and your channel is 3. So it is either RGB or it is not properly casted to Depth image. What I would recommend is to use PIL package to format your image after reading with opencv. See modes in PIL for more information. https://pillow.readthedocs.io/en/5.1.x/handbook/concepts.html#modes And shape of a frame for depth model must be (1, 1, 1, 112, 112)torch.Tensor
, which is annotated as (batch size
,channel
,sample duration
,height
,width
).sample size
is for the number of frames basically
What does "sample_duration" mean? What is the difference between "sample_duration_det" and detector queue.
It is the number of the frames in a video clip. *_det
and *_clf
are annotations for detector and classifier architecture, respectively.
Hope this makes sense, Ahmet
Hi, @ahmetgunduz ,
Thanks for your reply. Actually, the frame shape is (112,112,3) because I want to use cv2.imshow() to show depth image. I will try PIL package then.
Also, when I feed the model with (1,1,1,112,112) torch.Tensor. The detector model runs without error. However, the classifier model gives me the error message:
inputs_clf = torch.Size([1, 1, 1, 112, 112]) Traceback (most recent call last): File "video_test.py", line 337, in <module> outputs_clf = classifier(inputs_clf) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/media/yuhan/e71469ea-82d6-49ff-926e-dab1993ad0a2/Real-time-GesRec/models/resnext.py", line 169, in forward x = self.avgpool(x) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/yuhan/.local/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 636, in forward self.padding, self.ceil_mode, self.count_include_pad) RuntimeError: invalid argument 2: input image (T: 1 H: 4 W: 4) smaller than kernel size (kT: 2 kH: 4 kW: 4) at /pytorch/aten/src/THCUNN/generic/VolumetricAveragePooling.cu:57
Any suggestions?
It is because of the sample_duration
I think. Can you share your bash file?
The bash file is as follow:
#!/bin/bash python3 video_test.py \ --root_path /media/yuhan/e71469ea-82d6-49ff-926e-dab1993ad0a2/Real-time-GesRec\ --video_path video_screen \ --annotation_path annotation_EgoGesture/egogestureall.json \ --resume_path_det models/egogesture_resnetl_10_Depth_8.pth \ --resume_path_clf models/egogesture_resnext_101_Depth_32.pth \ --result_path results \ --dataset egogesture \ --sample_duration_det 8 \ --sample_duration_clf 32 \ --model_det resnetl \ --model_clf resnext \ --model_depth_det 10 \ --model_depth_clf 101 \ --resnet_shortcut_det A \ --resnet_shortcut_clf B \ --batch_size 1 \ --n_classes_det 2 \ --n_finetune_classes_det 2 \ --n_classes_clf 83 \ --n_finetune_classes_clf 83 \ --n_threads 16 \ --checkpoint 1 \ --modality_det Depth \ --modality_clf Depth \ --n_val_samples 1 \ --train_crop random \ --test_subset test \ --det_strategy median \ --det_queue_size 4 \ --det_counter 2 \ --clf_strategy median \ --clf_queue_size 16 \ --clf_threshold_pre 0.6 \ --clf_threshold_final 0.15 \ --stride_len 2 \
But when using classifier model in "online_test.py", the sample_duration is not involved.
`if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:]
outputs_clf = classifier(inputs_clf) outputs_clf = F.softmax(outputs_clf,dim=1) outputs_clf = outputs_clf.cpu().numpy()[0].reshape(-1,)`
aha my bad! @YuhanPeng1219 (batch size
, channel
, sample duration
, height
, width
) according to this annotation, your input must be (1,1,32,112,112) since you are using egogesture_resnext_101_Depth_{32}.pth
{} stands for the sample duration of the input frames. And you need to configure this code snippet because by default inputs is in "RGB-D" format.
`if opt.modality_clf == 'RGB':
inputs_clf = inputs[:,:-1,:,:,:]
elif opt.modality_clf == 'Depth':
inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1)
elif opt.modality_clf =='RGB-D':
inputs_clf = inputs[:,:,:,:,:]
and detector accept sample_duration
as 8 in that pretrained model.
@ahmetgunduz Thanks for your help again. Now the program can run without error. However, the results are always zero.
Start Evaluation det_selected_queue = [0. 0.] pridiction_det = 0 prob_det = 0.0 inputs_clf = torch.Size([1, 1, 32, 112, 112]) clf result [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] img_size = (112, 112, 1)
Have you guys seen this before? I have used PIL to format my input images. The mode I choose is 'L'.
I think it is because the prediction_det=0
always. If it is not 1
, classifier model will be always 0
@ahmetgunduz Thanks a lot for sharing your models. I also met a trouble during testing the model with RGB_D camera. Here i list my steps. Is there any problems.
- I set the modality as 'depth'. and i get data from depth camera, and resize it as (112,112)
- I put the data of 32 frames into a list called A . As the shape of this list is (32,112,112)
- I reshape this list A to (1,1,32,112,112), and call this as total_frames. total_frames = np.reshape(total_sample, (1, 1, sample_duration_clf, 112, 112)) 4.I took the last eight pieces of data and put them into the detector. inputs_det = torch.from_numpy(total_frames).float()[:,:,-sample_duration_det:,:,:,] outputs_det = detector(inputs_det)
- Finally, the raw output of the detector is wrong, and the result after softmax is 1 all the time. Part of the results are as follows : outputs_det: tensor([[-71.0893, 82.3528]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-74.1505, 80.2766]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-72.9503, 80.4595]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-75.3347, 79.0474]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-76.2401, 78.2442]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-72.4066, 73.7980]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-73.1670, 74.1967]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-73.1590, 77.5848]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-79.0164, 74.5970]], grad_fn=<AddmmBackward>) prob_det: 1.0 outputs_det: tensor([[-78.1467, 78.0265]], grad_fn=<AddmmBackward>) prob_det: 1.0
I think there maybe some wrong with the data i feed into the detector, but i can't figure it out. Any suggestion will help me a lot . Thanks again for sharing.
Hi @likestudy , did you cropped and normalized the input images using spation_transformation defined here https://github.com/ahmetgunduz/Real-time-GesRec/blob/cec7050b3cea2d3ec33967a1552d2022b89d4166/online_test.py#L147-L151
? Other than that everything looks fine.
In the online_test.py
, I fed this transformation to the dataset and you can apply this transformation to your frames as it is done here https://github.com/ahmetgunduz/Real-time-GesRec/blob/cec7050b3cea2d3ec33967a1552d2022b89d4166/datasets/egogesture_online.py#L231-L232
@ahmetgunduz Thank you for your help. I tried your method, but the results still have not changed.
I found that the background of the depth image in the dataset is black(Picture above). But in the depth image I collected with the xiton camera(Picture below), the hand in front is black, and the background is not pure gray.
I think there may be a problem with the depth image I am using. It seems that the images in the dataset are processed. But I can't find a way to turn the picture into the correct form.
Hi @likestudy ,have you know about how to process the images?
aha my bad! @YuhanPeng1219 (
batch size
,channel
,sample duration
,height
,width
) according to this annotation, your input must be (1,1,32,112,112) since you are usingegogesture_resnext_101_Depth_{32}.pth
{} stands for the sample duration of the input frames. And you need to configure this code snippet because by default inputs is in "RGB-D" format. `if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:]and detector accept
sample_duration
as 8 in that pretrained model.
Hello I have a question, Why you add the code
inputs_clf = torch.Tensor(inputs_clf.numpy()[:,:,::2,:,:])
in online_test.py,which will cause the error :RuntimeError: invalid argument 2: input image (T: 1 H: 4 W: 4) smaller than kernel size (kT: 2 kH: 4 kW: 4)
aha my bad! @YuhanPeng1219 (
batch size
,channel
,sample duration
,height
,width
) according to this annotation, your input must be (1,1,32,112,112) since you are usingegogesture_resnext_101_Depth_{32}.pth
{} stands for the sample duration of the input frames. And you need to configure this code snippet because by default inputs is in "RGB-D" format.if opt.modality_clf == 'RGB': inputs_clf = inputs[:,:-1,:,:,:] elif opt.modality_clf == 'Depth': inputs_clf = inputs[:,-1,:,:,:].unsqueeze(1) elif opt.modality_clf =='RGB-D': inputs_clf = inputs[:,:,:,:,:] and detector accept
sample_duration` as 8 in that pretrained model.Hello I have a question, Why you add the code
inputs_clf = torch.Tensor(inputs_clf.numpy()[:,:,::2,:,:])
in online_test.py,which will cause the error :RuntimeError: invalid argument 2: input image (T: 1 H: 4 W: 4) smaller than kernel size (kT: 2 kH: 4 kW: 4)
Correct, this line cause the error when running online_test.py. The error gone once I comment out that line.
@ahmetgunduz Thank you for your help. I tried your method, but the results still have not changed. @ahmetgunduz 感谢您的帮助。我尝试了您的方法,但结果仍然没有改变。 I found that the background of the depth image in the dataset is black(Picture above). But in the depth image I collected with the xiton camera(Picture below), the hand in front is black, and the background is not pure gray.我发现数据集中深度图像的背景是黑色的(上图)。但是在我用xiton相机收集的深度图像中(下图),前面的手是黑色的,背景不是纯灰色的。
![]()
I think there may be a problem with the depth image I am using. It seems that the images in the dataset are processed. But I can't find a way to turn the picture into the correct form.我认为我使用的深度图像可能有问题。似乎数据集中的图像已处理。但是我找不到将图片变成正确形式的方法。
hello,你最后解决这个问题了吗?
@ahmetgunduz Thank you for your help. I tried your method, but the results still have not changed. I found that the background of the depth image in the dataset is black(Picture above). But in the depth image I collected with the xiton camera(Picture below), the hand in front is black, and the background is not pure gray.
![]()
I think there may be a problem with the depth image I am using. It seems that the images in the dataset are processed. But I can't find a way to turn the picture into the correct form.
你好 最后解决这个问题了吗