HumanML3D icon indicating copy to clipboard operation
HumanML3D copied to clipboard

Questions on data preprocessing

Open LinghaoChan opened this issue 1 year ago • 14 comments

Hi @EricGuo5513 , thanks for your efforts in providing such a significant dataset to the community. I would like to know some details about your data preprocessing stage and I met up with some problems as follows. I hope to get your official answer.

  • For face Z+ correction in ‘motion_representation.ipynb’, I notice you conduct this operation many times. I am confusing that what the purpose of these operations.

    • For the first time in def process_file(positions, feet_thre):, the uniform_skeleton function uses the inverse_kinematics_np function: https://github.com/EricGuo5513/HumanML3D/blob/99b33e1cc7826ae96b0ee11a734453e250e5e75f/common/skeleton.py#L84. You calculate the ‘root_quat’ to correct the face direction. What is the purpose? Besides, why initialize the root_quat[0] as (1, 0, 0, 0). i noticed your conflicts in commit https://github.com/EricGuo5513/HumanML3D/commit/ab5b332c3148ec669da4c55ad119e0d73861b867 and commit https://github.com/EricGuo5513/HumanML3D/commit/3bbfc2ed3cf25df366c7bceb32c7a4762f39af3d.

    • For the second time, you conduct it in ‘In[3]’ process_file function. What is the purpose?

      '''All initially face Z+'''
      r_hip, l_hip, sdr_r, sdr_l = face_joint_indx
      across1 = root_pos_init[r_hip] - root_pos_init[l_hip]
      across2 = root_pos_init[sdr_r] - root_pos_init[sdr_l]
      across = across1 + across2
      across = across / np.sqrt((across ** 2).sum(axis=-1))[..., np.newaxis]
      # forward (3,), rotate around y-axis
      forward_init = np.cross(np.array([[0, 1, 0]]), across, axis=-1)
      # forward (3,)
      forward_init = forward_init / np.sqrt((forward_init ** 2).sum(axis=-1))[..., np.newaxis]
      
      #     print(forward_init)
      
      target = np.array([[0, 0, 1]])
      root_quat_init = qbetween_np(forward_init, target)
      root_quat_init = np.ones(positions.shape[:-1] + (4,)) * root_quat_init
      
      positions_b = positions.copy()
      
      positions = qrot_np(root_quat_init, positions)
      
    • For the third time, you conduct this operation in ‘get_rifke’ function. What is the purpose?

      def get_rifke(positions):
      	'''Local pose'''
      	positions[..., 0] -= positions[:, 0:1, 0]
      	positions[..., 2] -= positions[:, 0:1, 2]
      	'''All pose face Z+'''
      	positions = qrot_np(np.repeat(r_rot[:, None], positions.shape[1], axis=1), positions)
      	return positions
      
    • For the forth time, you conduct this operation on the velocity? What is the purpose and why do it on the velocity?

      '''Get Joint Velocity Representation'''
      # (seq_len-1, joints_num*3)
      local_vel = qrot_np(np.repeat(r_rot[:-1, None], global_positions.shape[1], axis=1), global_positions[1:] - global_positions[:-1])
      
    • Besides, I notice another similar operation in ‘get_cont6d_params’:

      velocity = qrot_np(r_rot[1:], velocity)     # todo: issue
      '''Root Angular Velocity'''
      # (seq_len - 1, 4)
      r_velocity = qmul_np(r_rot[1:], qinv_np(r_rot[:-1]))
      
  • Another question: Why do you use the global position to calculate local velocity?

    '''Get Joint Velocity Representation'''
    # (seq_len-1, joints_num*3)
    local_vel = qrot_np(np.repeat(r_rot[:-1, None], global_positions.shape[1], axis=1), global_positions[1:] - global_positions[:-1])
    

    and the local_velocity of root (dim=1, 2) is different the value in x-z velocity (dim=193, 195):

image

LinghaoChan avatar Jul 13 '23 08:07 LinghaoChan

Hi, thanks for your interests on our dataset. The follows explain these operations individually:

  1. In inverse_kinematics_np, these is no need to face z+. We only need to extract the forward direction as the root rotation. Intializing root_quat[0] as (1, 0, 0, 0) is kind of a mistake. Because in my own postprocessed data, at this step, all motions should have been adjusted to initally facing Z+. Here this initialization is a double-check step. However, this is a bug if you follows the provided script to obtain the data. I tried to fix it. This will change the resulting data, while the current version has been widely used. So I recalled the change. But in anyway, since our global rotation representation is velocity based, and this only change the first frame. I suppose it won't make big difference in the final results.
  2. Second time, this is to make all motion facing the Z+ at the begining. This is a data processing step, which makes all data have uniform initial direction. This basically rotates the whole motion with the angle of the first pose. Again, since our global rotation representation is velocity based, I guess this step can be skipped. But anyway, this make it safe.
  3. For the third and fourth time, it's not data processing. Here we want to disentangle the global rotations and local rotation/position/velocity. So for local rotation/position/velocity, we only have global-invariant local information. This disentanglement is easier for the network to learn. Therefore, you can notice we cancelled the global rotations for the positions and velocity of all poses.
  4. In get_cont6d_params, it firstly get rotation-invariant velocity and then gets the root rotation velocity from rotations. Again, we want to disentangle root rotation and root velocity.

Global position for local velocity: I get the idea from the work of PFNN. I guess it should okay to obtain local velocity from local positions. Actually, they may be identical. I haven't got time to validate this by my self. But I don't think it will make big difference. Difference: I didn't expect this. I guess that's because the calculation of these two have minor discrepancies. For example, the root velocity use qrot_np(r_rot[1:], xxx), while local velocity use qrot_np(np.repeat(r_rot[:-1, None], xxx). Actually we only need to keep the root velocity in practice. And during recovery, you should always use the root velocity (dim=1,2)

Hope these clarify your concerns.

EricGuo5513 avatar Jul 13 '23 17:07 EricGuo5513

@EricGuo5513 I have a similar problem regarding the first point. Could you please tell me the effect of the following operation? Why do we need to calculate the root_quat?

'''Get Root Rotation'''
target = np.array([[0,0,1]]).repeat(len(forward), axis=0)
root_quat = qbetween_np(forward, target)
for chain in self._kinematic_tree:
    R = root_quat
    for j in range(len(chain) - 1):
        # (batch, 3)
        u = self._raw_offset_np[chain[j+1]][np.newaxis,...].repeat(len(joints), axis=0)
        # print(u.shape)
        # (batch, 3)
        v = joints[:, chain[j+1]] - joints[:, chain[j]]
        v = v / np.sqrt((v**2).sum(axis=-1))[:, np.newaxis]
        # print(u.shape, v.shape)
        rot_u_v = qbetween_np(u, v)

        R_loc = qmul_np(qinv_np(R), rot_u_v)

        quat_params[:,chain[j + 1], :] = R_loc
        R = qmul_np(R, R_loc)

@EricGuo5513 I am also confused by this. What is the purpose?

LinghaoChan avatar Jul 18 '23 07:07 LinghaoChan

@LinghaoChan @EricGuo5513 Could this be the source of the mismatch problem in body parts compared to the text reference, as I mentioned here [issue #85 ] It seems that not all motion initially faces Z+. For HumanML3D skeleton samples of poses that don't face Z+, this results in an incorrect text reference for the pose. The motion executed with the right hand is referenced in the text description as the left hand, and the same may happen with clockwise/counterclockwise, forward/backward...

rd20karim avatar Sep 06 '23 11:09 rd20karim

@LinghaoChan @EricGuo5513 Could this be the source of the mismatch problem in body parts compared to the text reference, as I mentioned here [issue #85 ] It seems that not all motion initially faces Z+. For HumanML3D skeleton samples of poses that don't face Z+, this results in an incorrect text reference for the pose. The motion executed with the right hand is referenced in the text description as the left hand, and the same may happen with clockwise/counterclockwise, forward/backward...

Yep. I am still confused.

LinghaoChan avatar Sep 06 '23 12:09 LinghaoChan

I think there is a relation between this issue and the others issues #55 #20 #45 #85 This Z+ initialization and Swapping seems not works as intended. Because, there still samples that doesn't face the camera view. When I run an animation of text reference "a person waving with the left hand". The person is indeed waving with the right hand. I don't know if somehow this doesn't appear in the SMPL representation, or these samples did not have been visualized

rd20karim avatar Sep 06 '23 12:09 rd20karim

@rd20karim Can you provide the file with the error? Like, providing the filename.

LinghaoChan avatar Sep 06 '23 14:09 LinghaoChan

@LinghaoChan
All files where the person doesn't face the camera view seems to have this problem for example from the Test Set

Skeleton Face the opposite view of the camera

Raise his left arm instead of right arm sample id 158 /references : a person raises his right arm and then waves at someone ,a person waiving looking straight and then turning attention to the left ,a person raises their hand turns to their right while waving and then stops and lowers their hand

The right leg executing motion instead of left sample id 55 /references :a person kicked with left leg ,kicking foot with arms towards chest ,a person holds both hands up in front of his face and then kicks with his left leg

rd20karim avatar Sep 06 '23 15:09 rd20karim

@rd20karim Your index seem not the same with mine. For your id 158, it is 002651 for me.

I visualized the unmirrored and mirrored motions: 002651 M002651 The results seem good.

LinghaoChan avatar Sep 07 '23 12:09 LinghaoChan

@LinghaoChan The problem seems to not appear in SMPL visualization as a thought, but in the skeleton-based visualization using the 3D joint coordinates of the files .npy the skeleton doesn't face the camera view and the left/right part are inversed in the description.

rd20karim avatar Sep 07 '23 12:09 rd20karim

@rd20karim Could you please share the codes for visualization?

LinghaoChan avatar Sep 07 '23 12:09 LinghaoChan

@LinghaoChan Yes, here is the code, maybe the path should be modified

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation, rc

sample_path = "./HumanML3D/new_joints/002651.npy"
joint_poses = np.load(sample_path) # shape (T,22,3)
shift_scale = joint_poses.mean(0).mean(1)
x = joint_poses[:,:,0]
y = joint_poses[:,:,1]
z = joint_poses[:,:,2]
min_x, min_z, min_y, max_x, max_z, max_y  = x.min(),z.min(),y.min(),x.max(),z.max(),y.max()
  def plot_frame_3d(x, y, z, fig=None, ax=None):
      if fig is None:
          fig = plt.figure()
      if ax is None:
          ax = plt.axes(projection='3d',zdir='y')
      ax.scatter(x, y, z, 'red',marker='.')
      ax.set_xlim3d([min_x, max_x])
      ax.set_ylim3d([min_z, max_z])
      ax.set_zlim3d([min_y, max_y])
      ax.set_xlabel('X Label')
      ax.set_ylabel('Y Label')
      ax.set_zlabel('Z Label')
      return ax, fig
      
  def animate_3d(x, y, z, fps=20):
      import matplotlib.pyplot as plt
      fig = plt.figure()
      ax = plt.axes(projection='3d')
      frames = x.shape[0]
  
      def animate(i):
          plt.cla()
          ax_f, fig_f = plot_frame_3d(x[i], y[i], z[i], fig, ax)
          return ax_f
      
      return animation.FuncAnimation(fig, animate,
                                     frames=frames, interval=1000. / float(fps), blit=False)
anims = animate_3d(x,z,y)
anims.save("_test.mp4")

rd20karim avatar Sep 07 '23 14:09 rd20karim

@LinghaoChan The problem is solved, after the discussion with the author, I found that the y and z axis should not be swapped for HumanML3D (not KIT-ML) instead the camera view should be changed by the elevation and azimuth only. This simple detail creates a big difference in visualization, as swapping y and z result in another mirrored version, which doesn't face camera view necessarily.

rd20karim avatar Sep 11 '23 07:09 rd20karim

@rd20karim I am sorry for not replying to you in time. Thanks for your clarification.

LinghaoChan avatar Sep 11 '23 11:09 LinghaoChan

Hi, thanks for your interests on our dataset. The follows explain these operations individually:

1. In inverse_kinematics_np, these is no need to face z+. We only need to extract the forward direction as the root rotation. Intializing root_quat[0] as (1, 0, 0, 0) is kind of a mistake. Because in my own postprocessed data, at this step, all motions should have been adjusted to initally facing Z+. Here this initialization is a double-check step. However, this is a bug if you follows the provided script to obtain the data. I tried to fix it. This will change the resulting data, while the current version has been widely used. So I recalled the change. But in anyway, since our global rotation representation is velocity based, and this only change the first frame. I suppose it won't make big difference in the final results.

2. Second time, this is to make all **motion** facing the Z+ at the begining. This is a data processing step, which makes all data have uniform initial direction. This basically rotates the whole motion with the angle of the first pose. Again, since our global rotation representation is velocity based, I guess this step can be skipped. But anyway, this make it safe.

3. For the third and fourth time, it's not data processing. Here we want to disentangle the global rotations and local rotation/position/velocity. So for local rotation/position/velocity, we only have global-invariant **local** information. This disentanglement is easier for the network to learn. Therefore, you can notice we cancelled the global rotations for the positions and velocity of all poses.

4. In get_cont6d_params, it firstly get rotation-invariant velocity and then gets the root rotation velocity from rotations. Again, we want to disentangle root rotation and root velocity.

Global position for local velocity: I get the idea from the work of PFNN. I guess it should okay to obtain local velocity from local positions. Actually, they may be identical. I haven't got time to validate this by my self. But I don't think it will make big difference. Difference: I didn't expect this. I guess that's because the calculation of these two have minor discrepancies. For example, the root velocity use qrot_np(r_rot[1:], xxx), while local velocity use qrot_np(np.repeat(r_rot[:-1, None], xxx). Actually we only need to keep the root velocity in practice. And during recovery, you should always use the root velocity (dim=1,2)

Hope these clarify your concerns.

For disentangling root orientation dont you have to use the inverse of orientation? Also r_rot = 1,0,0,0. So why will rotating by this have any effect?

sohananisetty avatar Feb 23 '24 23:02 sohananisetty