face.evoLVe How do I prepare my input data? I have a folder of pngs, what do I do with them?

How do I prepare my input data? I have a folder of pngs, what do I do with them?

Open xjcl opened this issue 3 years ago • 10 comments

I tried putting the images directly in the data/ directory as instructed on the README.md page, but this just leads to the following error:

FileNotFoundError: [Errno 2] No such file or directory: './data/faces_emore/lfw/meta/sizes'

Someone in this issue suggested to use prepare_data.py from the following repository (which btw also is in this repo under backup/):

https://github.com/TreB1eN/InsightFace_Pytorch#323-prepare-dataset--for-training

But that seems to be unable to work with just .pngs either, it seems to be looking for some sort of .rec file:

mxnet.base.MXNetError: [17:27:25] src/io/local_filesys.cc:209: Check failed: allow_null:  LocalFileSystem::Open "data/faces_emore/train.rec": No such file or directory

Any advice? Thanks for your attention.

Nov 07 '20 16:11 xjcl

The code is quite confusing, as it contains various different methods for alignment and normalization. After a lot of experiments, I found out that the pre-trained network IR-152 works best if the images are preprocessed as follows:

Read in image
If your image is gray scale, convert it to 3 channel color image
Perform face detection using the script align/detector.py to obtain a face bounding box
Extend the face bounding box to a square box using the function convert_to_square in script align/box_utils.py
Crop image to square box
Resize image to 128x128
Center crop image to 112x112
If necessary (if image was read in by cv2) convert image from BGR to RGB
Convert image to tensor
Normalize mean and std to the values used in training (for IR-152, that's mean = [0.5, 0.5, 0.5] and std = [0.5, 0.5, 0.5])

That is essentially the procedure implemented in extract_feature_v1.py and extract_feature_v2.py except that these scripts assume that the images have already been cropped to the squared face bounding box. You can use the following code:

import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

Dec 07 '20 19:12 JoMe2704

Thanks a lot for going through the effort to write this answer. I've decided to use the Azure Face API rather than rolling my own, but your answer might be useful for someone else. I can't edit your answer, can you perhaps add formatting to the Python code?

Dec 21 '20 11:12 xjcl

@JoMe2704 I have question - does transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) map the pixel values from the 3 channels into the range [0;1] ? If so then if I want to map in range [-1;+1] will this work: transforms.Normalize( mean=[0, 0, 0], std=[1, 1, 1]) ?

Jan 02 '21 17:01 AGenchev

No, transforms.ToTensor() already maps the pixel values to tensor values in the range [0;1]. The subsequent transformation transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) maps each tensor value x to (x - 0.5)/0.5. See https://jhui.github.io/2018/02/09/PyTorch-Data-loading-preprocess_torchvision/ for an explanation. Thus, the resulting tensor has values in the range [-1, 1].

The transform transforms.Normalize( mean=[0, 0, 0], std=[1, 1, 1]) would leave the tensor values unchanged (x -> (x-0)/1), resulting in values in the range [0,1].

Jan 03 '21 14:01 JoMe2704

The code is quite confusing, as it contains various different methods for alignment and normalization. After a lot of experiments, I found out that the pre-trained network IR-152 works best if the images are preprocessed as follows:

Read in image

If your image is gray scale, convert it to 3 channel color image

Perform face detection using the script align/detector.py to obtain a face bounding box

Extend the face bounding box to a square box using the function convert_to_square in script align/box_utils.py

Crop image to square box

Resize image to 128x128

Center crop image to 112x112

If necessary (if image was read in by cv2) convert image from BGR to RGB

Convert image to tensor

Normalize mean and std to the values used in training (for IR-152, that's mean = [0.5, 0.5, 0.5] and std = [0.5, 0.5, 0.5])

That is essentially the procedure implemented in extract_feature_v1.py and extract_feature_v2.py except that these scripts assume that the images have already been cropped to the squared face bounding box. You can use the following code:
import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

Hi~ I don’t know what is the use of extracting these features. I hope to get the output label of the input image. What should I do?

May 31 '21 10:05 DrewdropLife

Hi @JoMe2704 @changxinC , I am trying to train a dataset, I am not able to figure out the data format required for training, currently my data is inside

D:/face.evoLVe.PyTorch/data/dataV1/ Inside dataV1 directory the data is as follows: -> id1/ -> 1.jpg -> ... -> id2/ -> 1.jpg -> ... -> ... -> ... -> ... Data is already aligned, resized to 112 using the align script provided in repo. When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.

It would help a lot of people if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.

Jul 27 '21 03:07 sriktrako

The code is quite confusing, as it contains various different methods for alignment and normalization. After a lot of experiments, I found out that the pre-trained network IR-152 works best if the images are preprocessed as follows:

Read in image

If your image is gray scale, convert it to 3 channel color image

Perform face detection using the script align/detector.py to obtain a face bounding box

Extend the face bounding box to a square box using the function convert_to_square in script align/box_utils.py

Crop image to square box

Resize image to 128x128

Center crop image to 112x112

If necessary (if image was read in by cv2) convert image from BGR to RGB

Convert image to tensor

Normalize mean and std to the values used in training (for IR-152, that's mean = [0.5, 0.5, 0.5] and std = [0.5, 0.5, 0.5])

That is essentially the procedure implemented in extract_feature_v1.py and extract_feature_v2.py except that these scripts assume that the images have already been cropped to the squared face bounding box. You can use the following code:
import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA
Hi~ I don’t know what is the use of extracting these features. I hope to get the output label of the input image. What should I do?

While the network has been trained to perform classification of the subjects (persons) in the training sets, this doesn't help you, if you look at images of persons that aren't in the training set. In order to use the network for images of any person, you use the features (embeddings) of the finale layer. These can be used to measure the similarity of the faces. Precisely, the euclidean distance between the embeddings of two face images is a measure for the dissimilarity. Depending on the specific variance of the images (face pose, facial expression, illumination, sharpness, ageing), you can set a threshold for the distance to decide if the images depict the same person.

Aug 09 '21 13:08 JoMe2704

Hi @JoMe2704 @changxinC , I am trying to train a dataset, I am not able to figure out the data format required for training, currently my data is inside

D:/face.evoLVe.PyTorch/data/dataV1/ Inside dataV1 directory the data is as follows: -> id1/ -> 1.jpg -> ... -> id2/ -> 1.jpg -> ... -> ... -> ... -> ... Data is already aligned, resized to 112 using the align script provided in repo. When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.

It would help a lot of people if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.

Sorry, I have no idea. I used my own images and scripts. And I didn't perform any training yet.

Aug 09 '21 13:08 JoMe2704

代码相当高，因为它包含了各种不同的卫星和归一化。经过大量实验，我发现如果对图像进行了如下的理想，则预训练网络IR-152效果最佳：

读入图片

如果您的图像是图像，图像其转换为3幅彩色图像

使用脚本 align/detector.py 进行人脸检测，获取人脸眼框

使用剧本align/box_utils.py

将图像传达到法院

将图像大小调整为 128x128

将图像中心为112x112

如有必要（如果图像被 cv2 读入）将图像从 BGR 转换为 RGB

将图像转换为张量

将均值和标准差归一化为训练中使用的值（对于 IR-152，即均值 = [0.5, 0.5, 0.5] 和标准差 = [0.5, 0.5, 0.5]）

这最后是在extract_feature_v1.py和extract_feature_v2.py中实现的，除了这些假想已经被代码骗到了人脸过程画面。您可以使用以下：
import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

hello, When I use the above code you mentioned, I set the inpiclist to the path of my picture: /home/face.evoLVe/max/150.jpg, but when I running this code,it shows the error that 'box' is not defined and backbone is not defined, how can I solve it? Thank you!

Aug 28 '21 11:08 chocokassy

hello, When I use the above code you mentioned, I set the inpiclist to the path of my picture: /home/face.evoLVe/max/150.jpg, but when I running this code,it shows the error that 'box' is not defined and backbone is not defined, how can I solve it? Thank you!

This was just the relevant code for face alignment, not a complete script. You need to initialize and load the model before.

# Load torch model
from model_irse import IR_152  # This example uses IR 152
MODELPATH = '/home/face.evoLVe/model_IR152.pth'  # change this to the path of your model file
backbone= IR_152((112, 112))
backbone.load_state_dict(torch.load(MODELPATH, map_location=torch.device('cpu'))) # change this to CUDA if you use a gpu
backbone.eval()
set_grad_enabled(False)

inpiclist should be a list of file paths, not a single file path.

IMGPATH = '/home/face.evoLVe/images/' # Change this to your image folder
inpiclist = [f for f in os.listdir(IMGPATH) if f.endswith('.jpg')]

There was an error in my code, right in the line before you get the error: box, points = detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)

So, here is the complete code:


import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

# Load torch model
from model_irse import IR_152  # This example uses IR 152
MODELPATH = '/home/face.evoLVe/model_IR152.pth'  # change this to the path of your model file
backbone= IR_152((112, 112))
backbone.load_state_dict(torch.load(MODELPATH, map_location=torch.device('cpu'))) # change this to CUDA if you use a gpu
backbone.eval()
set_grad_enabled(False)

IMGPATH = '/home/face.evoLVe/images/' # Change this to your image folder
inpiclist = [f for f in os.listdir(IMGPATH) if f.endswith('.jpg')]
for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   box, points = detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

Sep 01 '21 12:09 JoMe2704

face.evoLVe face.evoLVe copied to clipboard

How do I prepare my input data? I have a folder of pngs, what do I do with them?

face.evoLVe
face.evoLVe copied to clipboard