super-gradients
super-gradients copied to clipboard
Can we have independent pytorch based script for NAS detectors (coco dataset) evaluation
Looks like, yolo-NAS architectures have different prediction format (different than standard coco) for model.predict(...)
Can we have a distinct standalone pytorch function which does not use super-gradient code base rather uses only torch and coco-eval apis, and produce standard annotation format defined by coco?
The standard coco dataset has fixed format for prediction, as follows [ {"image_id":---, "category_id":---, "bbox":[x, y, w, h], "score": ---, ........} {"image_id":---, "category_id":---, "bbox":[x, y, w, h], "score": ---, ........} ]
The indicative script can be something like follows. This might need few more changes but i hope you get an idea. ` net = models.get("yolo_nas_m", pretrained_weights="coco") net.eval() net.cuda()
outputs = []
for each in dataloader:
pred = net(each)
'''
convert model prediction in standard coco format
'''
outputs.append(convert_to_coco_format(pred))
''' feed outputs to standard coco eval apis ''' `
Also, this type of function might help to compare performance of predict function form super-gradients with standard coco apis. I see that many people have raised some concerns where such independent script can help them. https://github.com/Deci-AI/super-gradients/issues/958 https://github.com/Deci-AI/super-gradients/issues/1016 https://github.com/Deci-AI/super-gradients/issues/977
Hi. I don't see how referenced issues are related to the matter. We intentionally re-implemented mAP metric from inconvenient implementation pycocoeval to a native pytorch, faster, DDP-friendly metric implementation. The only place where pycocotools-based metric is a viable option is academia research where you want to compare many models coming from different sources and really want to compare them using same evaluation methodology. We are open for external contributions here.
I appreciate with your/teams efforts for reimplementing mAP in more efficient manner, but I have an requirement, for research purpose which you exactly mentioned above.
So, Can you please give some documentation link or guidance to know how to convert output of model to standard coco format? Or at least tell me what is the format used by yolo-nas as output so that I can convert it to standard coco one?
Let's assume you have a trained model. The easies way to get predictions from the model is to use model.predict API. We designed it to be the most convenient for the user that just want to get the predictions. Image normalization, size preprocessing and NMS will be done for you automatically.
yolo_nas = super_gradients.training.models.get("yolo_nas_l", pretrained_weights="coco").cuda()
result:ImagesDetectionPrediction = yolo_nas.predict(Image.open("lena.jpg"))
Note that return result is a ImagesDetectionPrediction container that may contain detection results for multiple frames. That is you can use predict
method for images in the whole directory or a video.
Anyhow, if you're sending a single image then you can use result[0]
to get bounding boxes. Our you can iterate: for image_level_predictions in result
.
yolo_nas = super_gradients.training.models.get("yolo_nas_l", pretrained_weights="coco").cuda()
result:ImagesDetectionPrediction = yolo_nas.predict(Image.open("lena.jpg"))
predictions:ImagePrediction = result[0]
# And now you have it:
# Decoded bounding boxes (I think bounding boxes format is pretty clear here, right?)
predictions.prediction.bboxes_xyxy # [N,4]
predictions.prediction.confidence # [N]
predictions.prediction.labels # [N]
predictions.class_names
Bounding boxes are returned in the coordinate system of the original image so you can put them directly to the COCO json file according it it's format
predict function is going to be very slow, as predicting image one by one. can we have a function which takes output of last layer and converts it in coco standard format? I need this because our underneath framework is based on standard format, it just takes last layer output.
outputs = model(inputs) # inputs is tesor of images in NHCW format, so that I can pass images in batch results_dict = convert_to_coco_format(outputs) # result dict will be in coco standerd format
Thanks a lot for the detailed answer @BloodAxe. I definitely see the benefits of model.predict API
But if we want to use fp16 version of the pytorch , how does model.predict work in this case ?
converting the pytorch yolonas is easy and can be done like this model = yolo_nas_l.half()
but now how do we change the input to support fp16 ?
Also does the model.predict accept torch tensors instead of numpy or images ?
But if we want to use fp16 version of the pytorch , how does model.predict work in this case ? converting the pytorch yolonas is easy and can be done like this model = yolo_nas_l.half()
Converting to half seems to be working fine:
model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco").cuda().half()
model.predict("https://deci-datasets-research.s3.amazonaws.com/image_samples/beatles-abbeyroad.jpg").show()
The predict
takes numpy image, images folder or urls as input. Hope it answers your question.