facenet-pytorch-glint360k icon indicating copy to clipboard operation
facenet-pytorch-glint360k copied to clipboard

about make Triplet dataset

Open Kim-yonguk opened this issue 4 years ago • 2 comments

Is this a way to make hard triplet online? Is it offline?

Kim-yonguk avatar Dec 11 '19 03:12 Kim-yonguk

I would say it is Online since you are only selecting the triplets in a batch that pass the hard-negatives triplet selection condition instead of pre-computing the number of triplets you want to train on that pass the condition by doing a full pass on the training set at the start of each epoch.

Please do keep in mind my understanding may be false. I think the triplet generation before training would not count as Offline as it is only randomly generating triplets and not pre-computing any embeddings that pass the triplet selection condition. I have used this triplet selection method from tbmoon's 'facenet' repository and edited it to provide a numpy file containing the generated triplets to provide some 'reproducibility' in experiments, but the general way I know of generating triplets is to randomly pick anchors, positives and negatives on the fly to prevent selection bias.

It seems you will need a large batch size to get better performance using the Triplet Loss method, so you will need a GPU with a large VRAM (24 GB or more preferably) or multiple GPUs in parallel. I think the original FaceNet paper used a batch size of 1800 triplets and enforced a certain number of images per each identity in their dataset (40 face images per identity) that contained hundreds of millions of face images and they used a Semi-Hard negative triplet selection method.

It seems doing only a normal cross-entropy loss classification on the VGGFace2 dataset using an Inception-ResNet-V1 model architecture like in David Sandberg's 'facenet' repository will yield better results with less instability during training, so giving that a shot wouldn't hurt.

If you find any more information please let me know.

tamerthamoqa avatar Dec 11 '19 11:12 tamerthamoqa

Before we compute the embeddings, it is not known whether the negative in the triplet selected is Hard, Semi-Hard or Easy. The random generation before a pass might yield many "Easy" triplets. When these are fed into a "large" "mini"-batch to be evaluated during training (and only the Hard/Semi-hard selected), then we call it "Online".

AGenchev avatar Feb 16 '21 20:02 AGenchev