metrabs icon indicating copy to clipboard operation
metrabs copied to clipboard

Model Speed

Open Basti110 opened this issue 3 years ago • 3 comments

Hey,

this is an amazing pose estimation ai and it works realy good and is easy to use. I tried the new models and have some trouble. My goal is, to use your AI with 30 fps on 4 cameras. Currently I'm on ~22-25 fps with batching and the old model (metrabs_multiperson_smpl). So I tried the new ones. But they are much slower than metrabs_multiperson_smpl. Even the fastest one with less accuracy is barely faster than the old one. Additionaly the tensorflow load time is very high model = tf.saved_model.load(path). With the new models the system needs 18-62 seconds to be ready.

Images metrabs_multiperson_smpl metrabs_mob3l_y4 metrabs_eff2s_y4 metrabs_mob3l_y4t
Load Model 2,5 s 15 s 29 s 52 s
First Use 6 s 3 s 6,5 s 10 s
1 23 ms 28 ms 85 ms 250 ms
2 30 ms 28 ms 86 ms 250 ms
4 40 ms 37 ms 90 ms 260 ms
8 76 ms 58 ms 113 ms 300 ms
16 154 ms 116 ms 220 ms 600 ms

meassured call: pred = model.estimate_poses_batched(images, boxes=ragged_boxes, intrinsic_matrix=intrinsics)

1. So my first question is if this is normal? With this values, "metrabs_multiperson_smpl" is the best model for real time applications. But this model is "outdated" because the functions does not match the new code and api.

2. Questions about the model "metrabs_multiperson_smpl":

The paper states that the AI can handle up to 511 crops pers second with a stride size of 32 and batch of 8. Is this model build with a stride size of 4?

If thats the case, I have to build the model myself again with an higher stride size. Can you provide any checkpoints or models with higher stride size with the same dataset and backbone as metrabs_multiperson_smpl? If not, on wich datasets was this model trained and how much epochs did it need? If I have to train it again anyway, I can also use the new api for this model.

My System: AMD Ryzen 5800X (8x3,8-4,7Ghz) RTX 3080 32 GB RAM

Basti110 avatar Dec 02 '21 10:12 Basti110

Hi! Thanks for this detailed analysis! I have noticed slow initial load times as well, and 20-30 seconds seem the correct ballpark. I haven't had time to take a closer look at this issue yet. However, I'm in the process of doing some detailed timing benchmarks.

Meanwhile, you may want to check out the newly released ResNet-based models as well (https://github.com/isarandi/metrabs/blob/master/docs/MODELS.md). These are based on ResNetv1.5, initialized from weights exported from PyTorch checkpoints, which I found to work better. The old metrabs_multiperson_smpl used ResNet101v2.

The paper states that the AI can handle up to 511 crops pers second with a stride size of 32 and batch of 8. Is this model build with a stride size of 4?

All these packaged models all run at stride 32 (my experience was that denser striding usually has minimal quality impact except in the Human3.6M experiment, so usually the speed tradeoff is not worth it). The 511 crops/second does not take into account the time to decode an image, run detection, crop (reproject) etc, it only measures the network described by the paper itself, i.e. from the point of having a 256x256 image to getting the prediction. The full pipeline of a real application will always take longer in practice.

The training data is listed at https://github.com/isarandi/metrabs/blob/master/docs/MODELS.md. I haven't released preprocessing code for some of those datasets (at least not yet). It is a long and complicated process to get all that data to a form that can be used for training. The number of update steps was 400,000 with a batch size of 128, taking on the order of a week on an Nvidia A40 GPU.

isarandi avatar Dec 05 '21 21:12 isarandi

Thank you! The new models with resnet50 and resnet34 are perfect for me. But the load time is very high althought it is the same backbone as in metrabs_multiperson_smpl. Does the problem still exist if you build the model directly without tf.saved_model.load?

If it is that complex then i will not train the data myself.

Basti110 avatar Dec 07 '21 17:12 Basti110

@Basti110 , have you solved how to get the real time fps ? i run it in ubuntu18+2080ti+tf2.6+cuda11.2+cudnn8.1+model.estimate_poses, the fps is 10f/s, now i will change to use the gpu:3090 to test it in win10 and ubuntu18, are you sucessful in resnet50 and resnet34 ? if sucess, the fps in your 3080 ?

gao123qiang avatar Dec 28 '21 09:12 gao123qiang

Set num_aug=1 for fastest results. The new efficientnetv2-s models can run in real time even on a laptop GPU.

isarandi avatar Feb 17 '23 20:02 isarandi

chal zutha!!

kaiwalya1610 avatar Jul 29 '23 19:07 kaiwalya1610