inference icon indicating copy to clipboard operation
inference copied to clipboard

pycocotools for retinanet

Open psyhtest opened this issue 3 years ago • 14 comments

We've been using the standard pycocotools Python package for calculating the Object Detection accuracy since MLPerf Inference v0.5. It used to be OK for SSD-ResNet34 and SSD-MobileNet-v1, but it is rather painful for RetinaNet. First, this calculation is slow: it takes ~7-8 minutes on a decent workstation per scenario per system; in other words, it's ~15-25 minutes per system. Second, this calculation is memory hungry: see below this calculation strangling an Edge appliance with 8G RAM and 4G swap.

cocotools

psyhtest avatar Sep 27 '22 22:09 psyhtest

Hi @psyhtest , are you referring to this script?

arjunsuresh avatar Oct 08 '22 15:10 arjunsuresh

Yes, Arjun.

psyhtest avatar Oct 08 '22 15:10 psyhtest

Thank you Anton for your reply. On my laptop while doing the accuracy run for 5000 images, the speed of accuracy script is 60 images per second (faster than the workstation?) which is almost 60 times faster than the CPU inference speed and so hardly noticeable. I tried to call the cocoEval library using multiple threads as the description given here shows that the images can be processed in parallel and then we can call the accumulate function. But when we split and process the images list, the calculated scores are changing. So the only option to speed up the processing looks like to do parallel processing inside the evaluation function which is done in this Nvidia fork.

arjunsuresh avatar Oct 10 '22 07:10 arjunsuresh

@pgmpablo157321 To look at Arjun's proposal and give feedback on the feasibility.

rnaidu02 avatar Oct 11 '22 15:10 rnaidu02

@pgmpablo157321 To use the Nvidia fork of pycocotools we need to add instructions for using this fork and also update the accuracy numbers - there can be a slight difference here. We can give you an update on these by next week as we'll be checking them.

arjunsuresh avatar Oct 12 '22 08:10 arjunsuresh

Unfortunately the Nvidia fork is not working well with retinanet. This commit fixes the issue with PythonAPI but the C++ extension is giving poor accuracy.

arjunsuresh avatar Oct 21 '22 17:10 arjunsuresh

@nv-ananjappa

rnaidu02 avatar Nov 01 '22 16:11 rnaidu02

This is the patch we used on the inference repo when running using nvidia-pycocotools.

arjunsuresh avatar Nov 01 '22 17:11 arjunsuresh

@arjunsuresh We are using the (slow) script for MLPerf Inference too. 😁 Since you seem to be familiar with it, would you like to contribute by adding support for the faster NVIDIA cocoapi?

nv-ananjappa avatar Nov 02 '22 22:11 nv-ananjappa

Thank you @nv-ananjappa for checking. Unfortunately I'm not familiar with cocoapi to do that change :innocent: I had tried to parallelize the python API -- but realized that the original implementation is inherently sequential and that is why Nvidia fork with cpp extension made sense. I'll add my accuracy result as an issue in the Nvidia fork - it might be an easy fix for the original developer.

Meanwhile we are waiting about an hour for the accuracy run of retinanet on Nvidia T4 GPU (using reference implementation) and so 6-7 extra minutes is hardly noticeable :smile:

arjunsuresh avatar Nov 04 '22 08:11 arjunsuresh

@nv-ananjappa This is done now. This patch enables nvidia-pycocotools for openimages accuracy run and speeds up the accuracy check from 7.5 minutes to 2 minutes.

arjunsuresh avatar Dec 30 '23 20:12 arjunsuresh

@arjunsuresh That's great! How about memory consumption?

psyhtest avatar Jan 02 '24 12:01 psyhtest

Hi @psyhtest It was about 0.5% on an 768GB system. The original pycocotools had gone upto 1.6% of memory.

arjunsuresh avatar Jan 02 '24 23:01 arjunsuresh

@psyhtest Unfortunately even with the new change, the accuracy run fails on Thundercomm RB6 - 8 GB RAM and 4GB swap space. It runs fine on an Intel Sapphire Rapids in 46s with 256 GB RAM (only about 1% getting used).

arjunsuresh avatar Jan 09 '24 18:01 arjunsuresh