norfair
norfair copied to clipboard
CPU bottleneck when running the pose estimation demo
Hi,
I am trying to track pose estimates using the "Tracking pedestrians with AlphaPose" demo as a reference. However I am using Nvidia trt-pose (https://github.com/NVIDIA-AI-IOT/trt_pose) instead of alpha pose as given in the demo.
The pose estimation alone runs well at around 25fps (having about 50% CPU usage), however when I include the pose tracking, my fps drops to about 10-12 fps and its definitely a CPU bottleneck as my CPU usage is around 98% when running tracking. I would like to know if this is considered "normal" with the pose estimation tracking or I am doing something wrong in my end.
PC specs GTX 1060 6GB intel i7 8500 H 6GB ram
Thanks for the great work.
Hi,
Do you track every skeleton points of each individual ? Are you dealing with a lot of people on your videos?
If not done yet, did you try with a sample video containing only one person? Are the results in terms of FPS the same?
Hi,
I tested on a video with 1 person with tracking on 18 keypoints, which brings about 10fps. I also tested on a video with 4 people and got around 10fps too. I also did some experimenting and tried tracking 6 or 8 keypoints for each person which increases the fps to about 20.
Can you please share your code and sample video you are using?
I'll look into it.
Hi,
Since I have already integrated this code into another project, I cannot share the entire thing. However I made a demo python code of what i'm trying to do. I've simplified a lot of the things and generated some fake keypoints. If I disable the tracker (enable_tracker=False), code runs with < 10% CPU utilization, however with enable_tracker=True I get 100% CPU utilization.
Also even though in my actual code there was a fps drop too, I cannot replicate this fps drop here, most likely because this demo code is too simple to cause any drop. However the high CPU utilization is very strange indeed.
https://gist.github.com/pramod-wick/7338033c8ce03285cc6e2662f746da56
Hi,
I ran your code and got the same CPU utilization as you. I also did not notice any drop in fps.
Are you sure that this drop is not due to another part of your code?
I suggest you to profile your application so that you can see exactly which processes take most time. To do that, you can run the following command:
python -m cProfile --sort cumulative YOUR_APPPLICATION.py &> out.log
A file called "out.log" will be created. Inside it, you'll find a table with the cumulative times that each function took. Feel free to share this table if you need another eye.
Hi,
Thanks for the quick response and support :), Could you confirm that with "enable_tracker=True", the CPU utilization is not increased? . I also ran the cProfile command, however do not see any problems in the log file (attached) out.log
Hi,
I confirm that the CPU utilization is also at 100% when with tracker enabled, but no drop in fps.
Regarding the profiling, I was referring to your initial application, i.e. the one for which you notice high FPS drops. Can you please perform a profiling of this application and analyze the log? Sorry for the misunderstanding.
Hi,
I performed profiling for a 100 frames of my original application with the norfair pose tracker (posesort.log) vs with SORT tracker (abewley/sort) with bounding box input (sort.log).
There definitely is a increase in inference time, By analyzing the log file, I guess that the 100% CPU utilization in the norfair pose tracker causes my cnn (densenet) to take up extra processing time.
I would like to know if CPU utilization could be reduced in anyway or is this is expected behavior when tracking 18 points?
Thanks again for the support
Hi @pramod-wick, you should be getting much more than 10fps with that machine. There are a ton of CPU speed optimization opportunities that we plan to tackle soon, but even without those you should be getting larger numbers than those.
Also, thank you @lweicker for the help with answering!
Hi,
I checked your logs; the differences in process time for trtkeypoints.py:98(find_key_points)
and densenet.py:XX(forward)
between SORT and norfair are odd. I don't understand how the tracking could influence the process densenet at all. The input video for both logs was the same? Is your process done sequentially? You only changed the tracking algorithm between your two runs or is there anything else different?
For information, I also run trt_pose and norfair (among other processes) on one of my application. I run it with a Nvidia Jetson Xavier NX. My application is catching two rtsp streams @ 1080p, live. The average process time is about 18 fps (over 4500 iterations) with a minimum of 11 fps.
The only difference I can imagine is the number of tracked points. In my case, I only track 1 point per person (which consists in a combination of keypoints) instead of the 18 you mentioned. In each image process I track between 0 and 15 people detected.
For another case, I use norfair combined with ssd mobilenet v2 model (among other processes) on another Xavier NX. Process takes on average 15.5fps with minimum of about 9 fps, but this time for 4 rtsp input streams (3x 1080p, 1x 4K).
@joaqo Yes definitely looking forward for those optimizations :)
@lweicker I agree the results are strange indeed, because the trt pose densenet is completely decoupled from the tracking, so the tracking should not influence it. It was the same program on the same video (100 frames), only difference was swapping the tracking algorithm, and yes the program is sequential.
Good to hear it running well on the jetson devices, In my case it may very well be the number of keypoints, since if I reduce number of keypoints to around 6 the CPU utilization drops <20% and no significant changes in fps.
The optimized Kalman filter is default since #145. We also have a profiling demo that uses TRT pose! Therefore, I am closing this issue.
Please open another issue should you encounter more performance issues in the future :muscle: