learn-to-cluster
learn-to-cluster copied to clipboard
How I use your pretrained model for unlabeled face images?
Hi, i want to use this method preprocess many unlabed face images, how i use your pretrained model to classify and labeled. Thank you very much!
@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.
@yl-1993 Thanks for your reply! Whetheri use generate_proposal.py extract features and use dsgcn/main.py to cluster? Can you supported your pretrained model in Baidu Yun? How many images supported in code, maybe i have million images.
@SharharZ Yes, you can follow the pipeline in sctipts/pipeline.sh
. As shown in our face clustering benchmark, it can handle at least 5M unlabeled face images.
@SharharZ The pretrained model has already been shared through Baidu Yun. Checkout Setup and get data for more details.
@yl-1993 Thank you! I'm sorry that maybe I didn't describe it clearly. I mean the pretrained model of hfsoftmax.I analysised the code and download your data. I am no sure how generate the .bin file and npz file for my face image data. In other words, i extract face features in 512 dimension, how to covert into your format file.
@SharharZ I think you can store your features with np.save
. More details can be found in extract.py. Besides, I will upload the pretrained face recognition model to Baidu Yun soon.
@yl-1993 thank you very much!
@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.
@yl-1993 There are some different pre-trained models for extracting face features in the link you provided, which feature extracting pre-trained model matches for the clustering's pre-trained model?
@SharharZ Pretrained models for feature extraction has been uploaded to BaiduYun. You can find the link in the hfsoftmax wiki.
@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.
- If you only want to try the clustering method, you can directly use the extracted features.
- If you want to extract your own features and train the clustering model, you can choose any model as the feature extractor.
@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.
- If you only want to try the clustering method, you can directly use the extracted features.
- If you want to extract your own features and train the clustering model, you can choose any model as the feature extractor.
@yl-1993 感谢回复!不过我理解如果用的提face feature的模型不一致,聚类的模型应该需要重训吧?所以想确认下哪一个提feature模型是和放出来的聚类预训练模型是匹配的。
@jxyecn 是的,所以上述回复中说,如果你想抽取自己的特征并训练你的聚类模型,可以选择任意的特征提取模型。另外,这个ResNet-50的模型参数和聚类预训练模型用到的略有不同,如果发现有较大影响,可以继续在这个issue下留言。
@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.
The question is how to use your main.py file. I wanted to provide extracted face features (face embeddings), but your config file seems to be taking the training related files. I suppose I should put the directory of embedding in the test path location (of this file "cfg_test_0.7_0.75.yaml"). but can't figure out how this it is gonna work since it is also taking training file path. can you explain this part a bit?
@engmubarak48 Thanks for pointing out. For testing part, it will read training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path.
@yl-1993 Thanks for your quick reply. I would like to ask, which part of your code extracts/generates the features of images. I have read your generate_proposals.py file, and it seems to be taking .bin files. do we have to extract the features on our own, or there is a file that extracts the features and saves as a bin file. I was hoping there should a file in your repo, that exploits the face extraction pre-trained models and saves the extracted features as any format that the cluster faces file will accept.
thanks.
@engmubarak48 Since this repo focuses on the clustering framework, the face recognition training and feature extraction are not included. You can checkout hfsoftmax for pretrained model and feature extraction. Similar discussion can be found in https://github.com/yl-1993/learn-to-cluster/issues/4.
@engmubarak48 Thanks for pointing out. For the testing part, it will read the training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path.
@yl-1993 Since the data is unlabeled, I can have only one file that consists of extracted features (assuming that I extracted my features and saved as a bin file). but in your test config file, there is a path pointing to a .meta file (which indicates the labels according to my understanding). what type of labels are they, and why do we need, since we are clustering unlabeled images.
or meta.file is used only for evaluation. and can be removed if the evaluation is not needed?
Dear @yl-1993 what I intend to do is the following.
- extract features of my unlabeled image data via your extract_feat.py
- then, by using your main.py script to cluster the embeddings.
And, also I realized your extract_feat.py in hfsoftmax reads images from bin.file. So, I think I should save my numpy array image data into a bin file too.
Could you please, in steps, clarify for me "the format my data should be" and also "what needs to be filled in the config file?"--- both in the extract.py and main.py
I would really appreciate.
@engmubarak48
- You can use the
FileListDataset
which takes filelist and image prefix as input.
val_dataset = FileListDataset(
args.val_list, args.val_root,
transforms.Compose([
transforms.Resize(args.image_size),
transforms.CenterCrop(args.input_size),
transforms.ToTensor(),
normalize,
]))
- For feature extraction, we don't need config file. You can use the following command.
python extract_feat.py \
--arch {} \
--batch-size {} \
--input-size {} \
--feature-dim {} \
--load-path {} \
--val_list {} \
--val_root {} \
--output-path {}
Dear @yl-1993
The main question I asked is what should I fill to the .meta file if I don't have the labels of the data. In your "cfg_test_0.7_0.75.yaml" config file. there is a path pointing to this file "part1_test.meta"
In general, I only want to cluster the images. and add each cluster to a folder. then check the clusters manually.
Thanks
@engmubarak48 Sorry for not fully understanding your question. For a quick fix, you can simply use a dummy meta for testing, which will not influence the clustering result. The meta file is currently used for measuring the difference between predicted score and ground-truth score. It is a reference value in test phase. This is a good point. We will support empty meta during inference soon.
@engmubarak48 https://github.com/yl-1993/learn-to-cluster/pull/17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network.
Can I use which model you provide to extract face features and then use the clustering model(pretrained_gcn_d.pth.tar) you provide to process my own images?
@felixfuu You can use resnet50-softmax
as the feature extractor. (It is a little different with the feature extractor used to train the clustering model. If there is a big performance drop, feel free to report under this issue.)
@engmubarak48 #17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network.
Thanks, @yl-1993, I have already made it work back then when I was checking the performance. Do you have any further plans to improve the performance? I am working on this area (face clustering), let me know if you planning further research on this area. we might exchange some ideas.
@felixfuu You can use
resnet50-softmax
as the feature extractor. (It is a little different with the feature extractor used to train the clustering model. If there is a big performance drop, feel free to report under this issue.)
How to make an annotation file(.meta) for new data?
@felixfuu For clustering, you only need to feed features and proposals into the trained network.
The result of my experiment is not very good. i used 940 faces (many of the same ids) to cluster out 900 labels. Almost every picture has a label. @yl-1993
@yl-1993 I use resnet50-softmax as the feature extractor, and follow the pipeline in sctipts/pipeline.sh. Is there an error in this process?
@felixfuu The overall procedure is correct. I think there are two ways to check your results. (1) Check the extracted features. You can use the scripts/generate_proposals.sh
to generate cluster proposals, which can be regarded as the clustering results. You may reduce the k
or maxsz
for your data (940 instances). This step only depends on the extracted features and should yield reasonable results. (2) Check the pipeline. You can download the provided features and reproduce the result on ms1m.
@yl-1993 According to your suggestion, I visualize the cluster proposals and the result of clustering is not good, so it should be the reason of the feature. In my experiment, the k = 20, max=100.