HumanML3D
HumanML3D copied to clipboard
Questions on data preprocessing
Hi @EricGuo5513 , thanks for your efforts in providing such a significant dataset to the community. I would like to know some details about your data preprocessing stage and I met up with some problems as follows. I hope to get your official answer.
-
For face Z+ correction in ‘motion_representation.ipynb’, I notice you conduct this operation many times. I am confusing that what the purpose of these operations.
-
For the first time in
def process_file(positions, feet_thre):
, theuniform_skeleton
function uses theinverse_kinematics_np
function: https://github.com/EricGuo5513/HumanML3D/blob/99b33e1cc7826ae96b0ee11a734453e250e5e75f/common/skeleton.py#L84. You calculate the ‘root_quat’ to correct the face direction. What is the purpose? Besides, why initialize the root_quat[0] as (1, 0, 0, 0). i noticed your conflicts in commit https://github.com/EricGuo5513/HumanML3D/commit/ab5b332c3148ec669da4c55ad119e0d73861b867 and commit https://github.com/EricGuo5513/HumanML3D/commit/3bbfc2ed3cf25df366c7bceb32c7a4762f39af3d. -
For the second time, you conduct it in ‘In[3]’ process_file function. What is the purpose?
'''All initially face Z+''' r_hip, l_hip, sdr_r, sdr_l = face_joint_indx across1 = root_pos_init[r_hip] - root_pos_init[l_hip] across2 = root_pos_init[sdr_r] - root_pos_init[sdr_l] across = across1 + across2 across = across / np.sqrt((across ** 2).sum(axis=-1))[..., np.newaxis] # forward (3,), rotate around y-axis forward_init = np.cross(np.array([[0, 1, 0]]), across, axis=-1) # forward (3,) forward_init = forward_init / np.sqrt((forward_init ** 2).sum(axis=-1))[..., np.newaxis] # print(forward_init) target = np.array([[0, 0, 1]]) root_quat_init = qbetween_np(forward_init, target) root_quat_init = np.ones(positions.shape[:-1] + (4,)) * root_quat_init positions_b = positions.copy() positions = qrot_np(root_quat_init, positions)
-
For the third time, you conduct this operation in ‘get_rifke’ function. What is the purpose?
def get_rifke(positions): '''Local pose''' positions[..., 0] -= positions[:, 0:1, 0] positions[..., 2] -= positions[:, 0:1, 2] '''All pose face Z+''' positions = qrot_np(np.repeat(r_rot[:, None], positions.shape[1], axis=1), positions) return positions
-
For the forth time, you conduct this operation on the velocity? What is the purpose and why do it on the velocity?
'''Get Joint Velocity Representation''' # (seq_len-1, joints_num*3) local_vel = qrot_np(np.repeat(r_rot[:-1, None], global_positions.shape[1], axis=1), global_positions[1:] - global_positions[:-1])
-
Besides, I notice another similar operation in ‘get_cont6d_params’:
velocity = qrot_np(r_rot[1:], velocity) # todo: issue '''Root Angular Velocity''' # (seq_len - 1, 4) r_velocity = qmul_np(r_rot[1:], qinv_np(r_rot[:-1]))
-
-
Another question: Why do you use the global position to calculate local velocity?
'''Get Joint Velocity Representation''' # (seq_len-1, joints_num*3) local_vel = qrot_np(np.repeat(r_rot[:-1, None], global_positions.shape[1], axis=1), global_positions[1:] - global_positions[:-1])
and the local_velocity of root (dim=1, 2) is different the value in x-z velocity (dim=193, 195):
Hi, thanks for your interests on our dataset. The follows explain these operations individually:
- In inverse_kinematics_np, these is no need to face z+. We only need to extract the forward direction as the root rotation. Intializing root_quat[0] as (1, 0, 0, 0) is kind of a mistake. Because in my own postprocessed data, at this step, all motions should have been adjusted to initally facing Z+. Here this initialization is a double-check step. However, this is a bug if you follows the provided script to obtain the data. I tried to fix it. This will change the resulting data, while the current version has been widely used. So I recalled the change. But in anyway, since our global rotation representation is velocity based, and this only change the first frame. I suppose it won't make big difference in the final results.
- Second time, this is to make all motion facing the Z+ at the begining. This is a data processing step, which makes all data have uniform initial direction. This basically rotates the whole motion with the angle of the first pose. Again, since our global rotation representation is velocity based, I guess this step can be skipped. But anyway, this make it safe.
- For the third and fourth time, it's not data processing. Here we want to disentangle the global rotations and local rotation/position/velocity. So for local rotation/position/velocity, we only have global-invariant local information. This disentanglement is easier for the network to learn. Therefore, you can notice we cancelled the global rotations for the positions and velocity of all poses.
- In get_cont6d_params, it firstly get rotation-invariant velocity and then gets the root rotation velocity from rotations. Again, we want to disentangle root rotation and root velocity.
Global position for local velocity: I get the idea from the work of PFNN. I guess it should okay to obtain local velocity from local positions. Actually, they may be identical. I haven't got time to validate this by my self. But I don't think it will make big difference. Difference: I didn't expect this. I guess that's because the calculation of these two have minor discrepancies. For example, the root velocity use qrot_np(r_rot[1:], xxx), while local velocity use qrot_np(np.repeat(r_rot[:-1, None], xxx). Actually we only need to keep the root velocity in practice. And during recovery, you should always use the root velocity (dim=1,2)
Hope these clarify your concerns.
@EricGuo5513 I have a similar problem regarding the first point. Could you please tell me the effect of the following operation? Why do we need to calculate the
root_quat
?'''Get Root Rotation''' target = np.array([[0,0,1]]).repeat(len(forward), axis=0) root_quat = qbetween_np(forward, target) for chain in self._kinematic_tree: R = root_quat for j in range(len(chain) - 1): # (batch, 3) u = self._raw_offset_np[chain[j+1]][np.newaxis,...].repeat(len(joints), axis=0) # print(u.shape) # (batch, 3) v = joints[:, chain[j+1]] - joints[:, chain[j]] v = v / np.sqrt((v**2).sum(axis=-1))[:, np.newaxis] # print(u.shape, v.shape) rot_u_v = qbetween_np(u, v) R_loc = qmul_np(qinv_np(R), rot_u_v) quat_params[:,chain[j + 1], :] = R_loc R = qmul_np(R, R_loc)
@EricGuo5513 I am also confused by this. What is the purpose?
@LinghaoChan @EricGuo5513 Could this be the source of the mismatch problem in body parts compared to the text reference, as I mentioned here [issue #85 ] It seems that not all motion initially faces Z+. For HumanML3D skeleton samples of poses that don't face Z+, this results in an incorrect text reference for the pose. The motion executed with the right hand is referenced in the text description as the left hand, and the same may happen with clockwise/counterclockwise, forward/backward...
@LinghaoChan @EricGuo5513 Could this be the source of the mismatch problem in body parts compared to the text reference, as I mentioned here [issue #85 ] It seems that not all motion initially faces Z+. For HumanML3D skeleton samples of poses that don't face Z+, this results in an incorrect text reference for the pose. The motion executed with the right hand is referenced in the text description as the left hand, and the same may happen with clockwise/counterclockwise, forward/backward...
Yep. I am still confused.
I think there is a relation between this issue and the others issues #55 #20 #45 #85 This Z+ initialization and Swapping seems not works as intended. Because, there still samples that doesn't face the camera view. When I run an animation of text reference "a person waving with the left hand". The person is indeed waving with the right hand. I don't know if somehow this doesn't appear in the SMPL representation, or these samples did not have been visualized
@rd20karim Can you provide the file with the error? Like, providing the filename.
@LinghaoChan
All files where the person doesn't face the camera view seems to have this problem for example from the Test Set
Skeleton Face the opposite view of the camera
Raise his left arm instead of right arm
sample id 158 /references : a person raises his right arm and then waves at someone
The right leg executing motion instead of left
sample id 55 /references :a person kicked with left leg
@rd20karim Your index seem not the same with mine. For your id 158, it is 002651 for me.
I visualized the unmirrored and mirrored motions:
The results seem good.
@LinghaoChan The problem seems to not appear in SMPL visualization as a thought, but in the skeleton-based visualization using the 3D joint coordinates of the files .npy the skeleton doesn't face the camera view and the left/right part are inversed in the description.
@rd20karim Could you please share the codes for visualization?
@LinghaoChan Yes, here is the code, maybe the path should be modified
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation, rc
sample_path = "./HumanML3D/new_joints/002651.npy"
joint_poses = np.load(sample_path) # shape (T,22,3)
shift_scale = joint_poses.mean(0).mean(1)
x = joint_poses[:,:,0]
y = joint_poses[:,:,1]
z = joint_poses[:,:,2]
min_x, min_z, min_y, max_x, max_z, max_y = x.min(),z.min(),y.min(),x.max(),z.max(),y.max()
def plot_frame_3d(x, y, z, fig=None, ax=None):
if fig is None:
fig = plt.figure()
if ax is None:
ax = plt.axes(projection='3d',zdir='y')
ax.scatter(x, y, z, 'red',marker='.')
ax.set_xlim3d([min_x, max_x])
ax.set_ylim3d([min_z, max_z])
ax.set_zlim3d([min_y, max_y])
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
return ax, fig
def animate_3d(x, y, z, fps=20):
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
frames = x.shape[0]
def animate(i):
plt.cla()
ax_f, fig_f = plot_frame_3d(x[i], y[i], z[i], fig, ax)
return ax_f
return animation.FuncAnimation(fig, animate,
frames=frames, interval=1000. / float(fps), blit=False)
anims = animate_3d(x,z,y)
anims.save("_test.mp4")
@LinghaoChan The problem is solved, after the discussion with the author, I found that the y and z axis should not be swapped for HumanML3D (not KIT-ML) instead the camera view should be changed by the elevation and azimuth only. This simple detail creates a big difference in visualization, as swapping y and z result in another mirrored version, which doesn't face camera view necessarily.
@rd20karim I am sorry for not replying to you in time. Thanks for your clarification.
Hi, thanks for your interests on our dataset. The follows explain these operations individually:
1. In inverse_kinematics_np, these is no need to face z+. We only need to extract the forward direction as the root rotation. Intializing root_quat[0] as (1, 0, 0, 0) is kind of a mistake. Because in my own postprocessed data, at this step, all motions should have been adjusted to initally facing Z+. Here this initialization is a double-check step. However, this is a bug if you follows the provided script to obtain the data. I tried to fix it. This will change the resulting data, while the current version has been widely used. So I recalled the change. But in anyway, since our global rotation representation is velocity based, and this only change the first frame. I suppose it won't make big difference in the final results. 2. Second time, this is to make all **motion** facing the Z+ at the begining. This is a data processing step, which makes all data have uniform initial direction. This basically rotates the whole motion with the angle of the first pose. Again, since our global rotation representation is velocity based, I guess this step can be skipped. But anyway, this make it safe. 3. For the third and fourth time, it's not data processing. Here we want to disentangle the global rotations and local rotation/position/velocity. So for local rotation/position/velocity, we only have global-invariant **local** information. This disentanglement is easier for the network to learn. Therefore, you can notice we cancelled the global rotations for the positions and velocity of all poses. 4. In get_cont6d_params, it firstly get rotation-invariant velocity and then gets the root rotation velocity from rotations. Again, we want to disentangle root rotation and root velocity.
Global position for local velocity: I get the idea from the work of PFNN. I guess it should okay to obtain local velocity from local positions. Actually, they may be identical. I haven't got time to validate this by my self. But I don't think it will make big difference. Difference: I didn't expect this. I guess that's because the calculation of these two have minor discrepancies. For example, the root velocity use qrot_np(r_rot[1:], xxx), while local velocity use qrot_np(np.repeat(r_rot[:-1, None], xxx). Actually we only need to keep the root velocity in practice. And during recovery, you should always use the root velocity (dim=1,2)
Hope these clarify your concerns.
For disentangling root orientation dont you have to use the inverse of orientation? Also r_rot = 1,0,0,0. So why will rotating by this have any effect?