nanonets_object_tracking
nanonets_object_tracking copied to clipboard
Video and detection does not match.
It seems like the video linked to in the README.md (https://drive.google.com/open?id=1h2Wnb98tDVB6JlCDNQXCeZpG20x6AiZ2) does not match with the detections in Nanonets_object_tracking/det/.
In each of the det_*.txt files there are 1955 frames and the video consist of 2110 frames. This is also confirmed visually (Bounding boxes (detections) are not matching where the cars actually are) when using either the given model640.pt or a self-trained feature extractor on the given data and the program crashes when trying to process frame 1956 (for good reason).
Is there a new video or what is going on here?
Same issue XD
Same problem, is this model working?
I ran a detector on vdo.avi and dumped out detection result which matches the video clip.
https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77
using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
I ran a detector on vdo.vi and dumped out detection result which matches the video clip.
https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77
using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
Dear @yuntai , thank you for sharing. It is now working. Btw, how did you get the detection? I mean, can you be more specific?
I saw you detection result such as this first line: 1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1
Could you share what are all these number means?
Many thanks
Thanks @yuntai , that txt file works. and none of the other provided in the repo do.
@yuntai Hey, can you tell me how you generated the detections text file from the detectron2? I know we can generate output videos or images with detectron2 but not sure that it can generate the detections text file? Any help would be appreciated, thank you!
1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1 -I think the first number represents the frame number and the instance detected in that frame, so when it's repeated it means the number of instances detected in that frame -The -1s are just there as the code selects [2:6] only of each line you could either keep them like that or modify the code to select [1:5] and remove the rest of the -1s. -The next 4 numbers represent x1, y1 ,w, h the bounding box of the detected object, so you can get the width and height from x2-x1 and y2-y1 which you can get from the bounding box info from let's say: outputs["instances"].pred_boxes that will give you the tensor and you can get the values from: outputs["instances"].pred_boxes[i].tensor[0, 0].data.cpu().numpy() (tensor[0, 0] for x1) You can find more about the data types from the detectron2 documentation: https://detectron2.readthedocs.io/tutorials/models.html#model-input-format -The last number (0.996153) i think represents the accuracy You can basically write the numbers in that format in a text file and give the detections and the input video to the deepsort tracker and it should work fine. :)
1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1 -I think the first number represents the frame number and the instance detected in that frame, so when it's repeated it means the number of instances detected in that frame -The second number i think is a non existent class (let's say human) also i think every -1 represents a non existent class -The next 4 numbers represent x1, y1 ,w, h the bounding box of the detected object, so you can get the width and height from x2-x1 and y2-y1 which you can get from the bounding box info from let's say: outputs["instances"].pred_boxes that will give you the tensor and you can get the values from: outputs["instances"].pred_boxes[i].tensor[0, 0].data.cpu().numpy() (tensor[0, 0] for x1) You can find more about the data types from the detectron2 documentation: https://detectron2.readthedocs.io/tutorials/models.html#model-input-format -The last number (0.996153) i think represents the accuracy that it's the said class (let's say car) -The rest of the -1s represent non existent classes in the frame You can basically write the numbers in that format in a text file and give the detections and the input video to the deepsort tracker and it should work fine. :)
Hey, @MinaAbdElMassih thank you for the input! I appreciate it. I'm currently trying to output the text file. Have you tried doing this before?
@anzy0621 I haven't done this before, I managed to modify the code of the detectors of detectron2 API to write the detections info in that format in a .txt file and it worked, It's quiet simple once you manage to get the needed values. :)
If I have only two classes to detect, how do the columns become?
None of the models in the repository is working for me either. @yuntai # That worked for me. Thanks. Btw. good explanation @MinaAbdElMassih.
1,-1,126.682,445.587,489.079,205.913,0.996153,-1,-1,-1 -I think the first number represents the frame number and the instance detected in that frame, so when it's repeated it means the number of instances detected in that frame -The second number i think is a non existent class (let's say human) also i think every -1 represents a non existent class -The next 4 numbers represent x1, y1 ,w, h the bounding box of the detected object, so you can get the width and height from x2-x1 and y2-y1 which you can get from the bounding box info from let's say: outputs["instances"].pred_boxes that will give you the tensor and you can get the values from: outputs["instances"].pred_boxes[i].tensor[0, 0].data.cpu().numpy() (tensor[0, 0] for x1) You can find more about the data types from the detectron2 documentation: https://detectron2.readthedocs.io/tutorials/models.html#model-input-format -The last number (0.996153) i think represents the accuracy that it's the said class (let's say car) -The rest of the -1s represent non existent classes in the frame You can basically write the numbers in that format in a text file and give the detections and the input video to the deepsort tracker and it should work fine. :)
Thanks mina🤗
@AntonioMarsella I modified my answer above i think it better answers your question as the -1s don't represent classes.
I ran a detector on vdo.vi and dumped out detection result which matches the video clip.
https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77
using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
The video has been removed from google drive. Can you share yours please
I ran a detector on vdo.vi and dumped out detection result which matches the video clip. https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77 using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
The video has been removed from google drive. Can you share yours please
find . ...
on my hd got me this one. can you check this is the correct one?
https://drive.google.com/file/d/1PTBXBfCKuSCNk6wUGZ7pAQyj4rcKRkql/view?usp=sharing
I ran a detector on vdo.avi and dumped out detection result which matches the video clip.
https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77
using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
found this thread continued only now, but my local git repo is still there! in demo/predictor.py
in detectron2 repo.
def process_detected_instance(predictions, frame_no):
global outf
boxes = predictions.pred_boxes.tensor.numpy()
scores = predictions.scores.numpy()
classes = predictions.pred_classes.numpy()
mask = np.isin(classes, [0,1,2,3,5,7])
boxes = boxes[mask]
scores = scores[mask]
classes = classes[mask]
if outf is None:
outf = open('det.txt', 'w')
for i in range(len(classes)):
x1, y1, x2, y2 = list(boxes[i])
w = x2 - x1
h = y2 - y1
assert w > 0 and h > 0
print(','.join(
list(map(str,[frame_no,-1])) +
list(map(str, [x1, y1, w, h])) + [str(scores[i])] + ['-1']*3)
, file=outf, flush=True)
print("frame_no({}) num({})".format(frame_no, len(classes)))
and added
process_detected_instance(predictions, frame_no)
under elif "instance" in predictions:...
futher down below
I ran a detector on vdo.vi and dumped out detection result which matches the video clip. https://gist.github.com/yuntai/d0eb58b0eab620db65ac51e326be4c77 using detectron2 (COCO trained faster_rcnn_X_101_32x8d_FPN_3x) from https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
The video has been removed from google drive. Can you share yours please
find . ...
on my hd got me this one. can you check this is the correct one? https://drive.google.com/file/d/1PTBXBfCKuSCNk6wUGZ7pAQyj4rcKRkql/view?usp=sharing
Hi, could you please share your video again if you still have it because the link doesn't work anymore
https://drive.google.com/file/d/1ADVZyR3BdWUm-saeM6GcFtbw6E2lUcKk/view?usp=sharing
https://drive.google.com/file/d/1ADVZyR3BdWUm-saeM6GcFtbw6E2lUcKk/view?usp=sharing
Thanks a lot!