learn-to-cluster How I use your pretrained model for unlabeled face images?

Hi, i want to use this method preprocess many unlabed face images, how i use your pretrained model to classify and labeled. Thank you very much!

Jun 15 '19 02:06 SharharZ

@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.

Jun 15 '19 15:06 yl-1993

@yl-1993 Thanks for your reply! Whetheri use generate_proposal.py extract features and use dsgcn/main.py to cluster? Can you supported your pretrained model in Baidu Yun? How many images supported in code, maybe i have million images.

Jun 17 '19 06:06 SharharZ

@SharharZ Yes, you can follow the pipeline in sctipts/pipeline.sh. As shown in our face clustering benchmark, it can handle at least 5M unlabeled face images.

Jun 17 '19 21:06 yl-1993

@SharharZ The pretrained model has already been shared through Baidu Yun. Checkout Setup and get data for more details.

Jun 17 '19 21:06 yl-1993

@yl-1993 Thank you! I'm sorry that maybe I didn't describe it clearly. I mean the pretrained model of hfsoftmax.I analysised the code and download your data. I am no sure how generate the .bin file and npz file for my face image data. In other words, i extract face features in 512 dimension, how to covert into your format file.

Jun 18 '19 01:06 SharharZ

@SharharZ I think you can store your features with np.save. More details can be found in extract.py. Besides, I will upload the pretrained face recognition model to Baidu Yun soon.

Jun 18 '19 08:06 yl-1993

@yl-1993 thank you very much!

Jun 18 '19 09:06 SharharZ

@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.

@yl-1993 There are some different pre-trained models for extracting face features in the link you provided, which feature extracting pre-trained model matches for the clustering's pre-trained model?

Jun 21 '19 03:06 jxyecn

@SharharZ Pretrained models for feature extraction has been uploaded to BaiduYun. You can find the link in the hfsoftmax wiki.

Jun 26 '19 08:06 yl-1993

@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.

If you only want to try the clustering method, you can directly use the extracted features.
If you want to extract your own features and train the clustering model, you can choose any model as the feature extractor.

Jun 26 '19 08:06 yl-1993

@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.

If you only want to try the clustering method, you can directly use the extracted features.

If you want to extract your own features and train the clustering model, you can choose any model as the feature extractor.

@yl-1993 感谢回复！不过我理解如果用的提face feature的模型不一致，聚类的模型应该需要重训吧？所以想确认下哪一个提feature模型是和放出来的聚类预训练模型是匹配的。

Jun 26 '19 08:06 jxyecn

@jxyecn 是的，所以上述回复中说，如果你想抽取自己的特征并训练你的聚类模型，可以选择任意的特征提取模型。另外，这个ResNet-50的模型参数和聚类预训练模型用到的略有不同，如果发现有较大影响，可以继续在这个issue下留言。

Jun 26 '19 09:06 yl-1993

@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.

The question is how to use your main.py file. I wanted to provide extracted face features (face embeddings), but your config file seems to be taking the training related files. I suppose I should put the directory of embedding in the test path location (of this file "cfg_test_0.7_0.75.yaml"). but can't figure out how this it is gonna work since it is also taking training file path. can you explain this part a bit?

Jul 18 '19 13:07 engmubarak48

@engmubarak48 Thanks for pointing out. For testing part, it will read training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path.

Jul 19 '19 02:07 yl-1993

@yl-1993 Thanks for your quick reply. I would like to ask, which part of your code extracts/generates the features of images. I have read your generate_proposals.py file, and it seems to be taking .bin files. do we have to extract the features on our own, or there is a file that extracts the features and saves as a bin file. I was hoping there should a file in your repo, that exploits the face extraction pre-trained models and saves the extracted features as any format that the cluster faces file will accept.

thanks.

Jul 21 '19 12:07 engmubarak48

@engmubarak48 Since this repo focuses on the clustering framework, the face recognition training and feature extraction are not included. You can checkout hfsoftmax for pretrained model and feature extraction. Similar discussion can be found in https://github.com/yl-1993/learn-to-cluster/issues/4.

Jul 22 '19 09:07 yl-1993

@engmubarak48 Thanks for pointing out. For the testing part, it will read the training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path.

@yl-1993 Since the data is unlabeled, I can have only one file that consists of extracted features (assuming that I extracted my features and saved as a bin file). but in your test config file, there is a path pointing to a .meta file (which indicates the labels according to my understanding). what type of labels are they, and why do we need, since we are clustering unlabeled images.

or meta.file is used only for evaluation. and can be removed if the evaluation is not needed?

Dear @yl-1993 what I intend to do is the following.

extract features of my unlabeled image data via your extract_feat.py
then, by using your main.py script to cluster the embeddings.

And, also I realized your extract_feat.py in hfsoftmax reads images from bin.file. So, I think I should save my numpy array image data into a bin file too.

Could you please, in steps, clarify for me "the format my data should be" and also "what needs to be filled in the config file?"--- both in the extract.py and main.py

I would really appreciate.

Jul 23 '19 11:07 engmubarak48

@engmubarak48

You can use the FileListDataset which takes filelist and image prefix as input.

val_dataset = FileListDataset(
    args.val_list, args.val_root,
    transforms.Compose([
        transforms.Resize(args.image_size),
        transforms.CenterCrop(args.input_size),
        transforms.ToTensor(),
        normalize,
    ]))

For feature extraction, we don't need config file. You can use the following command.

python extract_feat.py \
        --arch {} \
        --batch-size {} \
        --input-size {} \
        --feature-dim {} \
        --load-path {} \
        --val_list {} \
        --val_root {} \
        --output-path {}

Jul 23 '19 13:07 yl-1993

Dear @yl-1993

The main question I asked is what should I fill to the .meta file if I don't have the labels of the data. In your "cfg_test_0.7_0.75.yaml" config file. there is a path pointing to this file "part1_test.meta"

In general, I only want to cluster the images. and add each cluster to a folder. then check the clusters manually.

Thanks

Jul 24 '19 13:07 engmubarak48

@engmubarak48 Sorry for not fully understanding your question. For a quick fix, you can simply use a dummy meta for testing, which will not influence the clustering result. The meta file is currently used for measuring the difference between predicted score and ground-truth score. It is a reference value in test phase. This is a good point. We will support empty meta during inference soon.

Jul 27 '19 14:07 yl-1993

@engmubarak48 https://github.com/yl-1993/learn-to-cluster/pull/17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network.

Aug 06 '19 12:08 yl-1993

Can I use which model you provide to extract face features and then use the clustering model(pretrained_gcn_d.pth.tar) you provide to process my own images?

Aug 14 '19 13:08 felixfuu

@felixfuu You can use resnet50-softmax as the feature extractor. (It is a little different with the feature extractor used to train the clustering model. If there is a big performance drop, feel free to report under this issue.)

Aug 14 '19 14:08 yl-1993

@engmubarak48 #17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network.

Thanks, @yl-1993, I have already made it work back then when I was checking the performance. Do you have any further plans to improve the performance? I am working on this area (face clustering), let me know if you planning further research on this area. we might exchange some ideas.

Aug 14 '19 15:08 engmubarak48

@felixfuu You can use resnet50-softmax as the feature extractor. (It is a little different with the feature extractor used to train the clustering model. If there is a big performance drop, feel free to report under this issue.)

How to make an annotation file(.meta) for new data?

Aug 16 '19 02:08 felixfuu

@felixfuu For clustering, you only need to feed features and proposals into the trained network.

Aug 16 '19 05:08 yl-1993

The result of my experiment is not very good. i used 940 faces (many of the same ids) to cluster out 900 labels. Almost every picture has a label. @yl-1993

Aug 18 '19 14:08 felixfuu

@yl-1993 I use resnet50-softmax as the feature extractor, and follow the pipeline in sctipts/pipeline.sh. Is there an error in this process?

Aug 21 '19 02:08 felixfuu

@felixfuu The overall procedure is correct. I think there are two ways to check your results. (1) Check the extracted features. You can use the scripts/generate_proposals.sh to generate cluster proposals, which can be regarded as the clustering results. You may reduce the k or maxsz for your data (940 instances). This step only depends on the extracted features and should yield reasonable results. (2) Check the pipeline. You can download the provided features and reproduce the result on ms1m.

Aug 21 '19 04:08 yl-1993

@yl-1993 According to your suggestion, I visualize the cluster proposals and the result of clustering is not good, so it should be the reason of the feature. In my experiment, the k = 20, max=100.

Aug 21 '19 06:08 felixfuu

learn-to-cluster learn-to-cluster copied to clipboard

How I use your pretrained model for unlabeled face images?

learn-to-cluster
learn-to-cluster copied to clipboard