character-motion-vaes
character-motion-vaes copied to clipboard
How to generate mocap.npz?
Hi, How to generage mocap.npz, which seems not easy to me. Can u give a clue how to generate mocap.npz from public mocap dataset?
The train_mvae.py script assumes the mocap data to be at environments/mocap.npz. The original training data is not included in this repo; but can be easily extracted from other public datasets.
Thanks very much!
BEST
+1, could you provide some details and format description about the mocap data? Thanks!
@fabiozinno @belinghy
+1, could you provide some details and format description about the mocap data? Thanks!
@Minotaur-CN @OOF-dura
the raw data format is not complex, just read the code below
- https://github.com/electronicarts/character-motion-vaes/blob/main/vae_motion/train_mvae.py#L140
- https://github.com/electronicarts/character-motion-vaes/blob/main/vae_motion/train_mvae.py#L141
Below is some information about the data format. We also note the length of each mocap sequence (as mentioned above at L141). This is so we don't sample invalid transitions for training. If the mocap clip is one long continuous sequence, then there no reason to do this.
0-3 : root delta x, delta y, delta facing
3-69 : joint coordinates (22 * 3 = 66)
69-135 : joint velocities in Cartesian coordinate in previous root frame (22 * 3 = 66)
135-267 : 6D joint orientations, i.e. first two columns of rotation matrix (22 * 6 = 132)
For extracting training data from mocap datasets, I think fairmotion might be helpful. Based on the examples I have seen, though I haven't tested, should be something like below. Root deltas need some more processing; essentially find the displacement vector and rotate by the current facing direction of the character. Same thing for positions and velocities, they should be projected to the character space to make learning easier.
from fairmotion.data import bvh
motion = bvh.load(BVH_FILENAME)
positions = motion.positions(local=False) # (frames, joints, 3)
velocities = positions[1:] - positions[:-1]
orientations = motion.rotations(local=False)[..., :, :2].reshape(-1, 22, 6)
@belinghy can you tell me more about how to get root deltas?
I think a sample, formula or code would be better
I may have misunderstood the whole process but since there aren't any sample of mocap.npz, I assume that mocap.npz should be like this:
mocap.npz "data": list of 267 float numbers (the first info about root delta is included in paper "pose representation") * frame "end_indices": 267? ( length of each mocap sequence )
It seems mocap data has to include only 22 joints, so, extracting from other public datasets may not work as bvh files or other mocap data out there may have different number of joints. Even if discarding irrelevant joints from the data, joint index order is another issue as you can see in mocap_env.py :(
Therefore.. I think there are two ways to solve:
- modify joint relevant parts in this project (mocap_env.py , etc.)
- or discard joints and keep joint index sequence refer to pose0.npy ( not sure about this )
I wasn't able to find what mocap database this project had used. and it wasn't in the paper..:(
Your understanding of the format is correct, except end_indices
marks the end of mocap clips. It depends on the number of mocap clips you have, so not necessarily 267. For example, if there are two clips of lengths 10 and 15, then end_indices = np.cumsum([10, 15]) - 1 = [9, 24]
.
As you've noted, mocap_env.py could definitely refactored. I think the only things to change if you are using different input format are these lines and these lines. The second reference is only if 0-3 : root delta x, delta y, delta facing
. Am I missing anything else?
So.. as it mentioned above, if I get this right, end_indices might contain one integer value if an input clip is a long continuous sequence, However, I still don't get what "length" is in this case. Is it a frame number? or is there other unit used?
Yes, it's a frame number. end_indices contains one integer value if there is exactly one input clip is a long continuous sequence.
Hi, I have some confusion about
135-267 : 6D joint orientations, i.e. first two columns of rotation matrix (22 * 6 = 132)
orientations = motion.rotations(local=False)[..., :, :2].reshape(-1, 22, 6)
in your case, is 'z-axis' the world up vector, and 6D joint orientations are the orientations of other two directions?
Furthermore, can you provide some examples for 0-3 : root delta x, delta y, delta facing
? I am a bit confused about the definition of these variables.
Thank you
Maybe this will help: https://arxiv.org/pdf/2103.14274.pdf : see pose representation for root information.
I think the paper and code is slightly different in terms of what up-vector they have used. Overall, root delta position has to include two values of root position projected on the ground, root facing direction also, and joint orientations have to include a form of rotation matrix of relative forward and upward vector.
Hello,
@belinghy , when reshaping the rotation components, as returned by fairmotion:
orientations = motion.rotations(local=False)[..., :, :2].reshape(-1, 22, 6)
, how would the vector components be distributed?
Considering the first joint in the first frame and the 6 on the third dimension contains 3 components for the first 2 columns of the rotation matrix, would they be laid out like in the first version below or as in the second one?
version 1:
orientations[0, 0, 0] = comp1_x
orientations[0, 0, 1] = comp1_y
orientations[0, 0, 2] = comp1_z
orientations[0, 0, 3] = comp2_x
orientations[0, 0, 4] = comp2_y
orientations[0, 0, 5] = comp2_z
version 2:
orientations[0, 0, 0] = comp1_x
orientations[0, 0, 1] = comp2_x
orientations[0, 0, 2] = comp1_y
orientations[0, 0, 3] = comp2_y
orientations[0, 0, 4] = comp1_z
orientations[0, 0, 5] = comp2_z
Hi @Gabriel-Bercaru, I'm not sure what is fairmotion's convention. Are you rendering the character using joint orientations? If not, for the purpose of neural network input, the order shouldn't matter.
Hello, indeed for the input training data, it doesn't really matter, but I was trying to render a mesh over a trained model.
As far as I have seen, rigging makes use of the joint orientations and in order to get them I should convert those 6D orientation vectors to either Euler rotations or quaternions
The way it's indexed, e.g., [..., :, :2]
, should correspond to version 2.