FairMOT
FairMOT copied to clipboard
problem of crowhuman datase?
The crowdhuman dataset doesn't have any ID tag, so how it's training successfully?
We give each bbox a unique id and the total ids are larger than 1,000,000.
Can you share the details about training of the self-supervised learning on crowdhuman? thank you
I would also be interested in learning more about the self-supervised learning part (especially about the implementation).
Great work and great model!
As far as i am concerned, the self-supervised learning in FariMOT is treating the process of refining object's features as a classification task, that is mean the num of classes is the same as the objects in datasets, such as 100000 in Crowdhuman dataset.
Thanks for the fast comment! :) You are probably right. When I read the paper, I was hoping that there would be some more contrastive-learning-type action happening. When I read the quote from the FairMOT paper which I pasted below, I thought there would be several instances of each image, created by random transformations, which could then be used to learn the representation.
"Inspired by [51], we regard each object instance in the dataset as a separate class and different transformations of the same object as instances in the same class. The adopted transformations include HSV augmentation, rotation, scaling, translation and shearing." (51 is this paper)
So during the pre-training on CrowdHuman or other image level datasets such as COCO, the "self-supervise" means that only the Re-ID head is self-supervised training, the rest three heads is "supervised training". Did I get it right?
And in this designed self-supervised training task, the Re_ID head is encouraged to distinguish as many instances as possible?