VIBE
VIBE copied to clipboard
Online Inference
How do I get the online inference?
This is something we are planning to work on. I will update here.
I have run a couple of instances on colab, works really well. Here is a link to my colab notebook which should work right away. I have done a video of a couple of inferences (VIBE rendering starts @ 23"): https://youtu.be/xyo5gl5GLEI
Hope it helps, good luck!
It's throwing the following error
NameError: name 'YouTubeVideo' is not defined
Let me have a look. I cleaned up the notebook before posting, so possibly I forgot an instruction. Will double check later today and revert.
Should work now, have a look at the updated notbook on my repo. I forgot initially to declare the youtube download module... Let me know any issue. If you want to feed the model with a webcam stream, you will have to play with the VIBE code I am afraid (I did not look at that part yet).
@Tetsujinfr it doesn't look like an online inference code. One needs to run VIBE together with the multi person tracker. We use an online tracker but it is a bit hacky to integrate it with VIBE.
Sorry, did I do something incorrectly? I kind of fast did it but did I miss a piece in the inferencing steps?
Not really. Here what @Zumbalamambo asks for is the implementation of online inference where you have a live feed eg. webcam. To achieve this you need to adapt multi-person-tracker. I didn't see that in your implementation.
Ok got you. I misunderatood that thread, I thought the one asking for the webcam was #46 and this one was just like "how do I run this model online (end to end), but indeed the notebook I quickly put together is a bit hacky and not meant to be a proper piece of code (I am not a coder). Regarding the webcam piece, do you have a suggestion w.r.t performance? Executing the 2d pose estimation and then the VIBE model requires sequential model loading (I assume both models can not load in memory at the same time but do not know for sure), so the end to end inference performance would likely suffer a lot. Am I correct? Do you have an approach on this? Thanks a lot
Looking for a webcam, inference. I can help out maybe? if it's already underway ? Kindly revert back Thanks!
@jaggernaut007 yes!..trying to do webcam inference
Did anyone try webcam inference? If yes, please share the changes that need to be done or share the inference code. thanks.
This is something we are planning to work on. I will update here.
Hello @mkocabas ! Firstly, thank you for sharing the work. is there an update about the real-time/video-cam implementation ?
Thanks
Hello all. I have some simple additions to the demo code to include the webcam inference ( https://github.com/gayatriprasad/VIBE ). Feel free to give it a try and pass on the feedback. You can run it using the following command : python demo.py --webcam --display --output_folder webcam/
Hi @gayatriprasad , Can you share with me the code ? I'll try it. My Gmail :- [email protected]
Hi @gayatriprasad , Can you share with me the code ? I'll try it. My Gmail :- [email protected]
Added the link to the code in the comment.
Hey @gayatriprasad, You added only inference code that is just capturing video from the webcam and store it to local machine and then vibe code is taking it as input video_file and producing results. But it's not real-time.
Hi there. Any updates on the live inference feature?
I implemented a frame-by-frame inference version of VIBE, which is capable for real-time inference. Project at: https://github.com/zc402/RT-VIBE. Try with Google Colab: https://colab.research.google.com/drive/1VKXGTfwIYT-ltbbEjhCpEczGpksb8I7o?usp=sharing
Interface:
import cv2
from vibe.rt.rt_vibe import RtVibe
rt_vibe = RtVibe()
cap = cv2.VideoCapture('sample_video.mp4')
while cap.isOpened():
ret, frame = cap.read()
rt_vibe(frame) # This will open a cv2 window
(I don't have a camera on my machine so I can't test camera code for google colab, it may not work.)
@zc402 I have tested your codes, it's really awsome , thanks a lot.
@ykk648 you're welcome
@ykk648 you're welcome
I found the cam params converge worse, and i use weak perspective in my code, in which kpy_2d = scale(kyp3d[, :2] + txy ). I think the key reason is the focal length of the dataset is different with each image, and it range from 400 mm to 800 mm. So maybe the network cannot regress the scale well? As "It is common to assume a fixed focal length to perform perspective projection. " . I wonder if the performance would be improved if i use perspective projection instead of weak perspective?