pytracking icon indicating copy to clipboard operation
pytracking copied to clipboard

CPU utils occupies a lot when inference

Open chenshihfang opened this issue 2 years ago • 11 comments

Hi, when inferencing, I notice that the program occupies a lot of CPU load.

Before running the program: ori

After running the program: set the thread as 1 1

set the thread as 30 image

It seems that there is no improvement.

Is there any way to reduce CPU utils?

chenshihfang avatar Sep 09 '22 10:09 chenshihfang

Hm didn't realize that before. I currently don't know what is causing this. Let me know if you found the issue.

2006pmach avatar Sep 13 '22 20:09 2006pmach

@sfchen94 - Which tracker are you running?

srama2512 avatar Sep 15 '22 06:09 srama2512

@srama2512 Dimp50, but I guess other trackers may have the similar problem

chenshihfang avatar Sep 15 '22 07:09 chenshihfang

BTW, this problem only occurs when inferencing, The training stage has no this kind of problem.

chenshihfang avatar Sep 15 '22 07:09 chenshihfang

@sfchen94 - Got it. I'm noticing high CPU usage during inference with KYS tracker as well. The GPU usage is quite low.

srama2512 avatar Sep 15 '22 13:09 srama2512

Hm I am still not sure why this is happening. Maybe it is related to opencv. What helps to reduce the load on the CPUs is limiting the number of CPUs that can be used by the python script with taskset --cpu-list 0-1 this limits the usage to two cores. So running for example taskset --cpu-list 0-1 python run_tracker.py tomp tomp50 lasot reduces the CPU workload without decreasing the FPS of the tracker but since this is not measuring the data loading time the overall throughput might be lower. Maybe @goutamgmb has an idea?

2006pmach avatar Sep 15 '22 14:09 2006pmach

@2006pmach Cool. It works! But why can it reduce CPU but still can have the same FPS 😆

chenshihfang avatar Sep 15 '22 14:09 chenshihfang

So to compute the FPS we only measure the time that the tracker takes here namely the call out = tracker.track(image, info) everything else is not measured to compute the FPS. So it could be that the overall runtime of the scripts is higher now since for example the data loading time could be increased (but this is not reflected in the FPS). I did't check this though. For me it is still not clear what is causing the high CPU load and what these cores are doing exactly...

2006pmach avatar Sep 15 '22 15:09 2006pmach

@2006pmach - Thanks for the taskset solution. It appears to be working right now. I restricted the CPU usage to 0-39 in my 80-core cluster machine. Interestingly, I'm observing that more kernel threads (red) are occupying the CPU load when compared to normal threads (green). Is this suggestive of anything specific to you? image

srama2512 avatar Sep 15 '22 15:09 srama2512

Yes, the real problem was not solved. For example, I have 40 CPU cores in total. Initially, the program needs 50% CPU loading. But when I force it to use CPU #1-10. It definitely occupies a maximum of 25% CPU loading.

But actually, this case assigned 50% CPU loading to 25% CPU core. The program still needs the same CPU loading after we use taskset.

chenshihfang avatar Sep 15 '22 16:09 chenshihfang

Hi. I think the following code may help you to solve this issue. In my case, the CPU occupation can be reduced by inserting these code, and the inference speed can also be improved a little.

import torch

cpu_num = 8  # Num of CPUs you want to use
os.environ['OMP_NUM_THREADS'] = str(cpu_num)
os.environ['OPENBLAS_NUM_THREADS'] = str(cpu_num)
os.environ['MKL_NUM_THREADS'] = str(cpu_num)
os.environ['VECLIB_MAXIMUM_THREADS'] = str(cpu_num)
os.environ['NUMEXPR_NUM_THREADS'] = str(cpu_num)
torch.set_num_threads(cpu_num)

Little-Podi avatar Sep 17 '22 04:09 Little-Podi

Hi. I think the following code may help you to solve this issue. In my case, the CPU occupation can be reduced by inserting these code, and the inference speed can also be improved a little.

import torch

cpu_num = 8  # Num of CPUs you want to use
os.environ['OMP_NUM_THREADS'] = str(cpu_num)
os.environ['OPENBLAS_NUM_THREADS'] = str(cpu_num)
os.environ['MKL_NUM_THREADS'] = str(cpu_num)
os.environ['VECLIB_MAXIMUM_THREADS'] = str(cpu_num)
os.environ['NUMEXPR_NUM_THREADS'] = str(cpu_num)
torch.set_num_threads(cpu_num)

This method can actually ease the CPU utils, so I temporally close this issue.

chenshihfang avatar Oct 13 '22 08:10 chenshihfang