nanonets_object_tracking Video and detection does not match.

It seems like the video linked to in the README.md (https://drive.google.com/open?id=1h2Wnb98tDVB6JlCDNQXCeZpG20x6AiZ2) does not match with the detections in Nanonets_object_tracking/det/.

In each of the det_*.txt files there are 1955 frames and the video consist of 2110 frames. This is also confirmed visually (Bounding boxes (detections) are not matching where the cars actually are) when using either the given model640.pt or a self-trained feature extractor on the given data and the program crashes when trying to process frame 1956 (for good reason).

Is there a new video or what is going on here?

Oct 03 '19 11:10 WilliamJuel

Same issue XD

Oct 04 '19 04:10 johnnylord

Same problem, is this model working?

Oct 22 '19 21:10 mswiniars

I ran a detector on vdo.avi and dumped out detection result which matches the video clip.

https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77

using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

Nov 02 '19 08:11 yuntai

I ran a detector on vdo.vi and dumped out detection result which matches the video clip.

https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77

using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

Dear @yuntai , thank you for sharing. It is now working. Btw, how did you get the detection? I mean, can you be more specific?

I saw you detection result such as this first line: 1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1

Could you share what are all these number means?

Many thanks

Dec 02 '19 10:12 Ujang24

Thanks @yuntai , that txt file works. and none of the other provided in the repo do.

Dec 03 '19 15:12 aniketvartak

@yuntai Hey, can you tell me how you generated the detections text file from the detectron2? I know we can generate output videos or images with detectron2 but not sure that it can generate the detections text file? Any help would be appreciated, thank you!

Feb 10 '20 03:02 anzy0621

1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1 -I think the first number represents the frame number and the instance detected in that frame, so when it's repeated it means the number of instances detected in that frame -The -1s are just there as the code selects [2:6] only of each line you could either keep them like that or modify the code to select [1:5] and remove the rest of the -1s. -The next 4 numbers represent x1, y1 ,w, h the bounding box of the detected object, so you can get the width and height from x2-x1 and y2-y1 which you can get from the bounding box info from let's say: outputs["instances"].pred_boxes that will give you the tensor and you can get the values from: outputs["instances"].pred_boxes[i].tensor[0, 0].data.cpu().numpy() (tensor[0, 0] for x1) You can find more about the data types from the detectron2 documentation: https://detectron2.readthedocs.io/tutorials/models.html#model-input-format -The last number (0.996153) i think represents the accuracy You can basically write the numbers in that format in a text file and give the detections and the input video to the deepsort tracker and it should work fine. :)

Feb 17 '20 07:02 MinaAbdElMassih

1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1 -I think the first number represents the frame number and the instance detected in that frame, so when it's repeated it means the number of instances detected in that frame -The second number i think is a non existent class (let's say human) also i think every -1 represents a non existent class -The next 4 numbers represent x1, y1 ,w, h the bounding box of the detected object, so you can get the width and height from x2-x1 and y2-y1 which you can get from the bounding box info from let's say: outputs["instances"].pred_boxes that will give you the tensor and you can get the values from: outputs["instances"].pred_boxes[i].tensor[0, 0].data.cpu().numpy() (tensor[0, 0] for x1) You can find more about the data types from the detectron2 documentation: https://detectron2.readthedocs.io/tutorials/models.html#model-input-format -The last number (0.996153) i think represents the accuracy that it's the said class (let's say car) -The rest of the -1s represent non existent classes in the frame You can basically write the numbers in that format in a text file and give the detections and the input video to the deepsort tracker and it should work fine. :)

Hey, @MinaAbdElMassih thank you for the input! I appreciate it. I'm currently trying to output the text file. Have you tried doing this before?

Feb 22 '20 21:02 anzy0621

@anzy0621 I haven't done this before, I managed to modify the code of the detectors of detectron2 API to write the detections info in that format in a .txt file and it worked, It's quiet simple once you manage to get the needed values. :)

Feb 26 '20 11:02 MinaAbdElMassih

If I have only two classes to detect, how do the columns become?

Mar 30 '20 17:03 AntonioMarsella

None of the models in the repository is working for me either. @yuntai # That worked for me. Thanks. Btw. good explanation @MinaAbdElMassih.

Apr 10 '20 12:04 utkutpcgl

1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1 -I think the first number represents the frame number and the instance detected in that frame, so when it's repeated it means the number of instances detected in that frame -The second number i think is a non existent class (let's say human) also i think every -1 represents a non existent class -The next 4 numbers represent x1, y1 ,w, h the bounding box of the detected object, so you can get the width and height from x2-x1 and y2-y1 which you can get from the bounding box info from let's say: outputs["instances"].pred_boxes that will give you the tensor and you can get the values from: outputs["instances"].pred_boxes[i].tensor[0, 0].data.cpu().numpy() (tensor[0, 0] for x1) You can find more about the data types from the detectron2 documentation: https://detectron2.readthedocs.io/tutorials/models.html#model-input-format -The last number (0.996153) i think represents the accuracy that it's the said class (let's say car) -The rest of the -1s represent non existent classes in the frame You can basically write the numbers in that format in a text file and give the detections and the input video to the deepsort tracker and it should work fine. :)

Thanks mina🤗

Apr 14 '20 19:04 MohamedMostafaSoliman

@AntonioMarsella I modified my answer above i think it better answers your question as the -1s don't represent classes.

Apr 14 '20 19:04 MinaAbdElMassih

I ran a detector on vdo.vi and dumped out detection result which matches the video clip.

https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77

using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

The video has been removed from google drive. Can you share yours please

Mar 18 '21 10:03 pooya-mohammadi

I ran a detector on vdo.vi and dumped out detection result which matches the video clip. https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77 using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

The video has been removed from google drive. Can you share yours please

find . ... on my hd got me this one. can you check this is the correct one? https://drive.google.com/file/d/1PTBXBfCKuSCNk6wUGZ7pAQyj4rcKRkql/view?usp=sharing

Jul 16 '21 16:07 yuntai

I ran a detector on vdo.avi and dumped out detection result which matches the video clip.

https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77

using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

found this thread continued only now, but my local git repo is still there! in demo/predictor.py in detectron2 repo.

def process_detected_instance(predictions, frame_no):                
    global outf                                                      
    boxes = predictions.pred_boxes.tensor.numpy()                    
    scores = predictions.scores.numpy()                              
    classes = predictions.pred_classes.numpy()                       
    mask = np.isin(classes, [0,1,2,3,5,7])                           
    boxes = boxes[mask]                                              
    scores = scores[mask]                                            
    classes = classes[mask]                                          
    if outf is None:                                                 
      outf = open('det.txt', 'w')                                    
                                                                     
                                                                     
    for i in range(len(classes)):                                    
      x1, y1, x2, y2 = list(boxes[i])                                
      w = x2 - x1                                                    
      h = y2 - y1                                                    
      assert w > 0 and h > 0                                         
      print(','.join(                                                
        list(map(str,[frame_no,-1])) +                               
        list(map(str, [x1, y1, w, h])) + [str(scores[i])] + ['-1']*3)
      , file=outf, flush=True)                                       
    print("frame_no({}) num({})".format(frame_no, len(classes)))

and added process_detected_instance(predictions, frame_no) under elif "instance" in predictions:... futher down below

Jul 16 '21 16:07 yuntai

I ran a detector on vdo.vi and dumped out detection result which matches the video clip. https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77 using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

The video has been removed from google drive. Can you share yours please

find . ... on my hd got me this one. can you check this is the correct one? https://drive.google.com/file/d/1PTBXBfCKuSCNk6wUGZ7pAQyj4rcKRkql/view?usp=sharing

Hi, could you please share your video again if you still have it because the link doesn't work anymore

May 06 '22 09:05 dvrbanic

https://drive.google.com/file/d/1ADVZyR3BdWUm-saeM6GcFtbw6E2lUcKk/view?usp=sharing

May 07 '22 08:05 yuntai

https://drive.google.com/file/d/1ADVZyR3BdWUm-saeM6GcFtbw6E2lUcKk/view?usp=sharing

Thanks a lot!

May 08 '22 11:05 dvrbanic