NeuralClothSim
NeuralClothSim copied to clipboard
Model Struggling to Learn Cloth Dynamics
I am running training with the T-Shirt garment on around ~700 animations downloaded from Mixamo (via mixamo_anims_downloader)
In the config, I have increased batch_size
to 300 (any more and I exhaust the 12GB of VRAM on my RTX 4070) but have left everything else default (i.e. temporal_window_size = 0.5
and reflect_probability = motion_augmentation = 0.0
.
Unfortunately after training this model for 100 epochs (does not sound like much, but each is made up of 209 batches, of 300 sequences) I am struggling to see much evidence of cloth dynamics. e.g (technically these gifs are from an earlier epoch but the output barely changed after training for longer):
Compared to setting motion=0.0
:
Maybe it is just really subtle, but I can barely see a difference.
I think this is the same animation 1:26 in https://youtu.be/6HxXLBzRXFg?t=86 so there's clearly a pretty huge gap between the results I'm getting and what was obtained for that video.
On that 100th epoch, I got the following metrics:
m/Loss: 1.0557 - m/Stretch: 6.5842e-06 - m/Shear: 0.1285 - m/Bending: 0.1992 - m/Collision: 0.0041 - m/Gravity: 1.0049 - m/Inertia: 0.0049
FYI after just 5 batches (not epochs!) of training, the metrics were:
m/Loss: 1.6685 - m/Stretch: 2.6247e-05 - m/Shear: 0.2460 - m/Bending: 0.0590 - m/Collision: 0.0359 - m/Gravity: 1.0188 - m/Inertia: 0.0046
So Stretch
, Shear
, Bending
, Collision
and Gravity
(to a lesser extent) losses all improved quite a bit, but it looks like inertia loss barely changed (if anything, it got worse?). Perhaps inertia loss starting off so tiny could be the cause for the model seemingly not learning dynamics?
I also tried a very un-scientific test: printing the mean and max magnitude of values in the dynamic and static encodings (just before running the decoder):
x_static_abs = tf.abs(x_static)
x_dynamic_abs = tf.abs(x_dynamic)
for i in range(tf_shape(x_static)[1]):
x_static_slice = x_static_abs[:, i, :]
x_dynamic_slice = x_dynamic_abs[:, i, :]
x_static_mean = tf.math.reduce_mean(x_static_slice)
x_static_max = tf.math.reduce_max(x_static_slice)
x_dynamic_mean = tf.math.reduce_mean(x_dynamic_slice)
x_dynamic_max = tf.math.reduce_max(x_dynamic_slice)
print(f"Means: Static - {x_static_mean}, Dynamic - {x_dynamic_mean}\nMaximums: Static - {x_static_max}, Dynamic - {x_dynamic_max}")
and got (for the last frame on the above animation):
Means: Static - 0.015779726207256317, Dynamic - 0.013485459610819817
Maximums: Static - 0.14864373207092285, Dynamic - 0.062263891100883484
Obviously average/maximum magnitude across values in the encodings won't always entirely correlate with size of output deformations, but at least it looks like the dynamic encoder is having some influence on the final output, just not anything that resembles coherent cloth dynamics.
My main question then: am I doing anything obviously wrong? I guess the most obvious thing left to try is to train for more epochs, but the paper did mention that simple garments should only take an hour to train (max a day). Training for 100 epochs took about a day on a 4070 and losses seem to be decreasing very slowly. Regardless, I will try running training over a few more days and update this issue if I managed to get a better result... Any other ideas for what I might be doing wrong (do I need to be training with a larger batch size?) or other assistance in general (perhaps sharing the exact config/set of Mixamo animations used to train the model shown in the paper?) would be very much appreciated. Thanks!!