deep-head-pose icon indicating copy to clipboard operation
deep-head-pose copied to clipboard

HI! how to train the hopenet? the loss can not convergence

Open wqz960 opened this issue 4 years ago • 6 comments

  1. how to get the image list?

  2. I rewrite the dataset preprocess which is based on yours, but it can not convergen, this is the dataset and log `class Face_300W_LP(Dataset): def init(self, data_dir, transform, img_ext='.jpg', annot_ext='.mat', image_mode='RGB'): self.data_dir = data_dir self.transform = transform self.img_ext = img_ext self.annot_ext = annot_ext self.transform = transform self.folders = ["AFW", "HELEN", "IBUG", "LFPW"] self.img_list = [] for folder in self.folders: self.img_list += glob(os.path.join(data_dir, folder, "*"+img_ext))

    def getitem(self, idx): img = Image.open(self.img_list[idx]).convert("RGB") meta = self.img_list[idx][:-4]+".mat" meta = sio.loadmat(meta)

     pt2d = meta['pt2d']
     x_min = min(pt2d[0,:])
     y_min = min(pt2d[1,:])
     x_max = max(pt2d[0,:])
     y_max = max(pt2d[1,:])
    
     # k = 0.2 to 0.40
     k = np.random.random_sample() * 0.2 + 0.2
     x_min -= 0.6 * k * abs(x_max - x_min)
     y_min -= 2 * k * abs(y_max - y_min)
     x_max += 0.6 * k * abs(x_max - x_min)
     y_max += 0.6 * k * abs(y_max - y_min)
     img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
    
     # We get the pose in radians
     pose = meta['Pose_Para'][0][:3]
     pitch = pose[0] * 180 / np.pi
     yaw = pose[1] * 180 / np.pi
     roll = pose[2] * 180 / np.pi
    
     #ds = 1 + np.random.randint(0,4) * 5
     #original_size = img.size
     #img = img.resize((img.size[0] // ds, img.size[1] // ds), resample=Image.NEAREST)
     #img = img.resize((original_size[0], original_size[1]), resample=Image.NEAREST)
    
     # Flip?
     rnd = np.random.random_sample()
     if rnd < 0.5:
         yaw = -yaw
         roll = -roll
         img = img.transpose(Image.FLIP_LEFT_RIGHT)
    
     # Blur?
     rnd = np.random.random_sample()
     if rnd < 0.05:
         img = img.filter(ImageFilter.BLUR)
    
     # Bin values
     bins = np.array(range(-99, 102, 3))
     binned_pose = np.digitize([yaw, pitch, roll], bins) - 1
    
     # Get target tensors
     labels = binned_pose
     cont_labels = torch.FloatTensor([yaw, pitch, roll])
    
     if self.transform is not None:
         img = self.transform(img)
    
     return img, labels, cont_labels, self.img_list[idx]
    

    def len(self): return len(self.img_list)`

Epoch [1/5], Iter [100/3826] Losses: Yaw 6.9623, Pitch 3.3666, Roll 3.3280 Epoch [1/5], Iter [200/3826] Losses: Yaw 7.0173, Pitch 3.7913, Roll 4.0655 Epoch [1/5], Iter [300/3826] Losses: Yaw 7.0390, Pitch 3.4928, Roll 3.3288 Epoch [1/5], Iter [400/3826] Losses: Yaw 6.6685, Pitch 3.4066, Roll 3.3632 Epoch [1/5], Iter [500/3826] Losses: Yaw 6.0093, Pitch 2.7562, Roll 2.8081 Epoch [1/5], Iter [600/3826] Losses: Yaw 7.0399, Pitch 3.5185, Roll 2.9296 Epoch [1/5], Iter [700/3826] Losses: Yaw 6.9090, Pitch 3.0738, Roll 2.7689 Epoch [1/5], Iter [800/3826] Losses: Yaw 7.5161, Pitch 3.4587, Roll 3.0765 Epoch [1/5], Iter [900/3826] Losses: Yaw 7.7006, Pitch 2.9306, Roll 2.8257 Epoch [1/5], Iter [1000/3826] Losses: Yaw 7.7306, Pitch 2.6414, Roll 2.8222 Epoch [1/5], Iter [1100/3826] Losses: Yaw 7.0110, Pitch 2.9134, Roll 3.1434 Epoch [1/5], Iter [1200/3826] Losses: Yaw 7.8895, Pitch 2.7056, Roll 2.9635 Epoch [1/5], Iter [1300/3826] Losses: Yaw 7.8618, Pitch 3.0785, Roll 2.8754 Epoch [1/5], Iter [1400/3826] Losses: Yaw 6.9509, Pitch 3.1440, Roll 2.3867 Epoch [1/5], Iter [1500/3826] Losses: Yaw 7.2655, Pitch 3.5011, Roll 2.7198 Epoch [1/5], Iter [1600/3826] Losses: Yaw 6.9521, Pitch 2.5396, Roll 3.0407 Epoch [1/5], Iter [1700/3826] Losses: Yaw 6.1507, Pitch 3.1033, Roll 2.3047 Epoch [1/5], Iter [1800/3826] Losses: Yaw 8.0398, Pitch 3.2253, Roll 3.1032 Epoch [1/5], Iter [1900/3826] Losses: Yaw 6.5448, Pitch 2.6368, Roll 2.5555 Epoch [1/5], Iter [2000/3826] Losses: Yaw 7.5095, Pitch 3.2314, Roll 3.2987 Epoch [1/5], Iter [2100/3826] Losses: Yaw 6.4053, Pitch 2.8100, Roll 2.7238 Epoch [1/5], Iter [2200/3826] Losses: Yaw 7.3014, Pitch 3.2478, Roll 2.8233 Epoch [1/5], Iter [2300/3826] Losses: Yaw 7.7167, Pitch 2.5214, Roll 2.9376 Epoch [1/5], Iter [2400/3826] Losses: Yaw 7.1232, Pitch 2.5696, Roll 2.3332 Epoch [1/5], Iter [2500/3826] Losses: Yaw 6.5463, Pitch 2.9003, Roll 2.8601 Epoch [1/5], Iter [2600/3826] Losses: Yaw 7.1496, Pitch 3.1998, Roll 3.0408 Epoch [1/5], Iter [2700/3826] Losses: Yaw 8.3173, Pitch 2.7032, Roll 2.5530 Epoch [1/5], Iter [2800/3826] Losses: Yaw 7.1783, Pitch 2.6990, Roll 2.5785 Epoch [1/5], Iter [2900/3826] Losses: Yaw 7.8416, Pitch 2.8020, Roll 3.3089

wqz960 avatar Dec 14 '20 13:12 wqz960

hello can you tell me the lowest losses for yaw pitch and roll? Thanks @natanielruiz

wqz960 avatar Dec 14 '20 14:12 wqz960

Hi, I met the same problem as well, have you solved it?

dfzsgjshzfj avatar Dec 18 '20 04:12 dfzsgjshzfj

@dfzsgjshzfj Using the original datapreprocessing code, the loss will drop to +-1.5, but the result on AFLW2000 is bad, I have given it up, turning to another repo.

wqz960 avatar Dec 18 '20 05:12 wqz960

Emmm. My loss is about yaw 4.0, pitch 2.8, roll 2.7. What is the original datapreprocessing code you mentioned, I did not find it in the repo

dfzsgjshzfj avatar Dec 18 '20 07:12 dfzsgjshzfj

Hi,I got similar results with you by using 300W-LP data set.Is this result correct? In the paper, I saw the Multi-Loss ResNet50 results of using the AFWL2000 data set.Is this Multi-Loss ResNet50 result the same as the training loss result? 你好,我使用300W-LP数据集得到了与你相似的结果,这个结果正常吗? 我在论文中看到Multi-Loss ResNet50的结果,论文中所给的Multi-Loss与训练所得的这个loss是同一个loss吗? image

kellen5l avatar Jan 29 '21 05:01 kellen5l

@dfzsgjshzfj Using the original datapreprocessing code, the loss will drop to +-1.5, but the result on AFLW2000 is bad, I have given it up, turning to another repo.

Can you mention the repo here?

GKG1312 avatar Mar 06 '24 12:03 GKG1312