arcface-pytorch dataset folder

hi i wnat to test your arcface code , but there is no dataset folder ! how i can find it?

Feb 18 '20 11:02 sepid91

For testing the code, you need to download the LFW dataset and place it under the 'data/Datasets/lfw/' folder (which you have to create on your own). For more information, look into the 'config.py' file (lines 21,22) :

lfw_root = 'data/Datasets/lfw/lfw-align-128' lfw_test_list = 'data/Datasets/lfw/lfw_test_pair.txt'

This file states the locations where you need to place the dataset and the corresponding list file.

May 04 '20 06:05 garg-7

I searched the website of lfw(http://vis-www.cs.umass.edu/lfw/) and and found several packages of dataset. The lfw-deepfunneled have a size of 250x250, which i guss doesn't match the lfw-align-128 . Would you please list out the source of the dataset used in this project, or the detail of buidling these dataset? i just couldn't help but want to figure out what does the images look like and the content of txts. I'm a rookie about face recognition, thanks~!

May 28 '20 01:05 Boatsure

I searched the website of lfw(http://vis-www.cs.umass.edu/lfw/) and and found several packages of dataset. The lfw-deepfunneled have a size of 250x250, which i guss doesn't match the lfw-align-128 . Would you please list out the source of the dataset used in this project, or the detail of buidling these dataset?

I couldn't get my hands on the dataset via the baidu link in the repo (you could try that), but you could use that 250x250 dataset too, with a few changes.

In test.py, before line 37, just add the resizing function.

.
.
if image is None:
    return None
image = cv2.resize(image, (128, 128)) #add this
image = np.dstack((image, np.fliplr(image)))
.
.

This would enable the model to work as expected and since the images are already square, resizing to 128x128 won't distort them.

The text file listing the images in the dataset has to be created following a certain format as mentioned in the lfw_test_pair.txt. If you encounter any issue there, mention it here.

i just couldn't help but want to figure out what does the images look like and the content of txts.

About this, the images are simply labelled faces of people (a.k.a. Labelled Faces in the Wild or LFW). About that text file, I've already specified in point 2 above.

May 28 '20 04:05 garg-7

@garg-7 Thanks! The baidu link (password: b2ec) works for me. But would you please explain what does CASIA-maxpy-clean-crop-144 mean and what's the structure of train/val_data_13939.txt ? I download the CASIA-WebFace dataset, how should I clean the dataset?

May 29 '20 02:05 Boatsure

@sureatgithub I didn't get what you meant by train/val_data_13939.txt. Where exactly is this file? I had actually used my own smaller dataset and I do not have the lfw-align-128, which is why I'm not familiar with its directory structure.

May 29 '20 04:05 garg-7

@garg-7 In the config.py line 15 and 16: train_list = '/data/Datasets/webface/train_data_13938.txt' val_list = '/data/Datasets/webface/val_data_13938.txt They don't exist in this repo and that's where I'm confused. I want to transfer the arcface method to another recognition mission so I have to build custom dataset to train the model from scratch. To do that I think it's necessary to make it clear how the training set is organized.

May 29 '20 05:05 Boatsure

@sureatgithub Oh. I haven't used it for training and this thread was about testing so I figured you were asking about that. I'm afraid I won't be able to help you. You could try looking into train.py and data/dataset.py to get an idea as to what is the format of the .txt files being used. Otherwise, this can be answered by @ronghuaiyang only.

PS. About the dataset - take a look here.

May 29 '20 06:05 garg-7

@garg-7 Thanks all the same. I'm reading the source code trying to speculate the directory structure of the training set. And yeah, this is the detail only @ronghuaiyang knows but it seems that he hasn't come here for a long time.

May 29 '20 06:05 Boatsure

@garg-7 Hello, I just started to learn face recognition. If I want to use other databases, how to list this text file, and whether there is a specific code?

Aug 26 '20 03:08 hanxiao120

@sepid91 Hello, I just started to learn face recognition. If I want to use other databases, how to list this text file, and whether there is a specific code?

Aug 26 '20 03:08 hanxiao120

Now i have done a project on my custom dataset with this code, and wonder why i asked such a stupid question back here. Don't stick to face recognition, this code essentially solves a generic task of classification. How the dataset is organized or how the text file lists is not important. The key is to get the path and label of each image in your dataset class definition. My train.txt is like below:

0000/0000_1_f.png 0 0000/0000_0_u.png 0 0001/0001_1_b.png 1 0001/0001_1_f.png 1 0001/0001_0_l.png 1 0001/0001_1_d.png 1 0001/0001_0.png 1 0001/0001_0_d.png 1 0002/0002_1.png 2 0002/0002_0.png 2 0002/0002_1_u.png 2

As you can see, 0000, 0001, 0002... are the folders where images of the same class hold, and 0, 1, 2... in the text file are just the id you give each class. Wether they are 0, 1, 2 is not import, just make them different.

Aug 28 '20 09:08 Boatsure

@surefyyq ok,thanks,I'm just starting to learn, and I don't know a lot about it.Thank you very much for your answer.

Aug 28 '20 11:08 hanxiao120

@surefyyq Hello, I'd like to ask if you have successfully run on your own database? I just started to learn. I don't know how to load my own database data into this program. Can you give me an answer? thank you.

Aug 31 '20 09:08 hanxiao120

Hi, I followed this blog and successfully run the code.

the cleaned dataset CASIA-maxpy-clean has 455594 images (494414 in total) and 10575 subjects. The cleaned_list contains the name of cleaned images, and data in CASIA-maxpy-clean is already cleaned.

I think train_data_13938.txt contains pathes/names of images which are used for cross validation and here we can use whole CASIA-webpage images for training.

the settings in config are changed to

    # train_root = './Datasets/webface/CASIA-maxpy-clean-crop-144/'
    train_root = './Datasets/webface/CASIA-maxpy-clean'
    # train_list = './Datasets/webface/train_data_13938.txt'
    train_list = './Datasets/webface/cleaned_list.txt'
    # val_list = './Datasets/webface/val_data_13938.txt'

    # test_root = './Datasets/anti-spoofing/test/data_align_256'
    # test_list = 'test.txt'

for the preprocessing dataset part, it is changed to (in train.py)

    # if not added grayscale, the input shape is not right
    train_transforms = T.Compose([
            T.Grayscale(), 
            T.RandomCrop(opt.input_shape[1:]),
            T.RandomHorizontalFlip(),
            T.ToTensor(),
            T.Normalize(mean=[0.5], std=[0.5]),
        ])

    train_dataset = torchvision.datasets.ImageFolder(opt.train_root, transform = train_transforms) 
    # train_dataset = Dataset(opt.train_root, opt.train_list, phase='train', input_shape=opt.input_shape)
    trainloader = data.DataLoader(train_dataset,
                                  batch_size=opt.train_batch_size,
                                  shuffle=True,
                                  num_workers=opt.num_workers)

Aug 14 '21 08:08 lizhenstat

hi i want to test your arcface code , but there is no cleaned_list.txt ! how i can find it?

Aug 15 '21 23:08 akila12-dev

@akila12-dev hi, here is the cleaned_list.txt cleaned_list.txt but if you download the dataset "CASIA-maxpy-clean" here the cleaned_list.txt is no longer needed.

Aug 16 '21 01:08 lizhenstat

arcface-pytorch arcface-pytorch copied to clipboard

dataset folder

arcface-pytorch
arcface-pytorch copied to clipboard