facenet-pytorch icon indicating copy to clipboard operation
facenet-pytorch copied to clipboard

Difficulty in understanding finetune.ipynb

Open santhoshnumberone opened this issue 3 years ago • 4 comments

I am referring to facenet-pytorch/examples/finetune.ipynb

I have a few doubts

1) How is data structured for training?

The dataset should follow the VGGFace2/ImageNet-style directory layout. Modify data_dir to the location of the dataset on wish to finetune on.

data_dir = '../data/test_images'

batch_size = 32
epochs = 8
workers = 0 if os.name == 'nt' else 8

Is this understanding of me correct? or is it something other than this? According to this documentation about imagenet directory structure #275 test_images directory contains two directory train and val

Directories train and val have directories Person1 Person2 Person3 Person4 .... according to a) Finetune a Facial Recognition Classifier to Recognize your Face using PyTorch (Directory Structure) b) ImageNet train/val images organized in the following folder structure c) Example use case - ImageNet subset d) How to prepare Imagenet dataset for Image Classification

  train/
  ├── Person1
  │   ├── Person1_01.JPEG
  │   ├── Person1_02.JPEG
  │   ├── ......
  ├── Person2
  │   ├── Person2_01.JPEG
  │   ├── Person2_02.JPEG
  │   ├── ......
  ├── Person3
  │   ├── Person3_01.JPEG
  │   ├── Person3_02.JPEG
  │   ├── ......
  ├── Person4
  │   ├── Person4_01.JPEG
  │   ├── Person4_02.JPEG
  │   ├── ......
  ├── ......
  val/
  ├── Person1
  │   ├── Person1_v01.JPEG
  │   ├── Person1_v02.JPEG
  │   ├── ......
  ├── Person2
  │   ├── Person2_v01.JPEG
  │   ├── Person2_v02.JPEG
  │   ├── ......
  ├── Person3
  │   ├── Person3_v01.JPEG
  │   ├── Person3_v02.JPEG
  │   ├── ......
  ├── Person4
  │   ├── Person4_v01.JPEG
  │   ├── Person4_v02.JPEG
  │   ├── ......
  ├── ......

2) Inception Resnet V1 module

resnet = InceptionResnetV1(
    classify=True,
    pretrained='vggface2',
    num_classes=len(dataset.class_to_idx)
).to(device)

according to help(InceptionResnetV1)

|      classify {bool} -- Whether the model should output classification probabilities or feature
|          embeddings. (default: {False})

2) i) shouldn't classify=FALSE?(for training here) as used here Get pretrained ResNet on VGGFace2 dataset

2) ii) num_classes value?

According to Out[1]: there are 11 classes that InceptionResnetV1 was trained on.

['Adrien_Brody','Alejandro_Toledo','Angelina_Jolie','Arnold_Schwarzenegger','Carlos_Moya','Charles_Moose','James_Blake','Jennifer_Lopez','Michael_Chaykowsky','Roh_Moo-hyun','Venus_Williams']

If I want to re-training InceptionResnetV1 using transfer learning on new set of data limited to 5 people from 11 people(previously trained), how will I change 11 person output to converge to only 5 people? does data.class_to_idx update it automatically based on the data_dir but data_dir has only two directories train and val too much confusion here

So, is this correct?

data_dir = '../data/test_images'

dataset = datasets.ImageFolder(data_dir, transform=transforms.Resize((512, 512)))
dataset.samples = [
    (p, p.replace(data_dir, data_dir + '_cropped'))
        for p, _ in dataset.samples
]

resnet = InceptionResnetV1(
    classify=True,
    pretrained='vggface2',
    num_classes=len(dataset.class_to_idx)
).to(device)

dataset.class_to_idx depend on the number of directories in train directories?

this explanation Using ImageFolder and DataLoader is very nice

train_dir = 'train'
valid_dir = 'valid'
test_dir  = 'test'

dirs = {'train': train_dir,  'valid': valid_dir, 'test' : test_dir}

image_datasets = {x: datasets.ImageFolder(dirs[x],   transform=data_transforms[x]) for x in ['train', 'valid', 'test']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=32, shuffle=True) for x in ['train', 'valid', 'test']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid', 'test']}

class_names = image_datasets['train'].classes

Shouldn't it be

train_dir = 'data/train'
valid_dir = 'data/val'

dirs = {'train': train_dir,  'valid': valid_dir}

image_datasets = {x: datasets.ImageFolder(dirs[x],   transform=data_transforms[x]) for x in ['train', 'valid']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=32, shuffle=True) for x in ['train', 'valid']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid']}

class_names = image_datasets['train'].classes

then what will happen to this code? are images being picked randomly here in this code?

dataset = datasets.ImageFolder(data_dir + '_cropped', transform=trans)
img_inds = np.arange(len(dataset))
np.random.shuffle(img_inds)
train_inds = img_inds[:int(0.8 * len(img_inds))]
val_inds = img_inds[int(0.8 * len(img_inds)):]

train_loader = DataLoader(
    dataset,
    num_workers=workers,
    batch_size=batch_size,
    sampler=SubsetRandomSampler(train_inds)
)
val_loader = DataLoader(
    dataset,
    num_workers=workers,
    batch_size=batch_size,
    sampler=SubsetRandomSampler(val_inds)
)

3) fine tune/transfer learning only the last layer is to be tuned

Looking at this How can I disable all layers gradient expect the last layer in Pytorch?

Found out the last layer of resnet

resnet = InceptionResnetV1(
    classify=True,
    pretrained='vggface2',
    num_classes=len(dataset.class_to_idx)
).to(device)

for name, param in resnet.named_parameters():
    print(name, param.requires_grad)

Named parameters of resnet

conv2d_1a.conv.weight True conv2d_1a.bn.weight True conv2d_1a.bn.bias True conv2d_2a.conv.weight True ...... ...... last_bn.weight True last_bn.bias True logits.weight True logits.bias True

Frooze all the layers for them not to be trained again with new data

for param in resnet.parameters():
    param.requires_grad = False

Unfrooze the last layer for it to be trained again for the new data

#Unfreeze the last layer
for param in resnet.logits.parameters():
    param.requires_grad = True

Named parameters of resnet Freezing all layers apart from last layer for training on new data

conv2d_1a.conv.weight False conv2d_1a.bn.weight False conv2d_1a.bn.bias False conv2d_2a.conv.weight False ...... ...... last_bn.weight False last_bn.bias False logits.weight True logits.bias True

So shouldn't this be performed before optimizer = optim.Adam(resnet.parameters(), lr=0.001)

santhoshnumberone avatar May 29 '21 19:05 santhoshnumberone

i think author set structure of test_images folder hasn't contains two directory train and val. it only contains person_1, person_2, ... and person_i has only 1 image.

namdh15 avatar May 31 '21 04:05 namdh15

I have the same question about finetune,the finetuning model training in my dataset performace terriable,so the finetune.ipynb is confusing。Looking forward to anyone's reply。thanks @santhoshnumberone @lilluv

Shame-fight avatar Jun 04 '21 01:06 Shame-fight

I have tried on timsler 's finetune notebook, and actually it works well on my dataset. (~90% accuracy with 49 classes) Something to be noted in my experience:

  • The dataset will not be divided as train and val sub-directories - Only one train folder is enough as his code divide the finetune dataset into 80% for training
  • Dataset needs to be big enough - In my experiments, I used some of augmentation techniques (Rotation, RandomBrightnessContrast, etc..) to increase training set from 700 images - 49 classes to be ~10k images (200 images per class). You could refer to albumentation as one of method to do it. Hope it help ^^

windspirit95 avatar Jul 26 '21 08:07 windspirit95

I have tried on timsler 's finetune notebook, and actually it works well on my dataset. (~90% accuracy with 49 classes) Something to be noted in my experience:

* The dataset will not be divided as train and val sub-directories - Only one train folder is enough as his code divide the finetune dataset into 80% for training

* Dataset needs to be big enough - In my experiments, I used some of augmentation techniques (Rotation, RandomBrightnessContrast, etc..) to increase training set from 700 images - 49 classes to be ~10k images (200 images per class). You could refer to albumentation as one of method to do it.
  Hope it help ^^

Do we need to perform any kind of preprocessing on the dataset before proceeding the or the Finetuning script takes care of that by itself? Thanks in advance:)

elon-trump avatar Nov 09 '23 04:11 elon-trump