NeuralClothSim icon indicating copy to clipboard operation
NeuralClothSim copied to clipboard

Model Struggling to Learn Cloth Dynamics

Open NathanielB123 opened this issue 8 months ago • 1 comments

I am running training with the T-Shirt garment on around ~700 animations downloaded from Mixamo (via mixamo_anims_downloader)

In the config, I have increased batch_size to 300 (any more and I exhaust the 12GB of VRAM on my RTX 4070) but have left everything else default (i.e. temporal_window_size = 0.5 and reflect_probability = motion_augmentation = 0.0.

Unfortunately after training this model for 100 epochs (does not sound like much, but each is made up of 209 batches, of 300 sequences) I am struggling to see much evidence of cloth dynamics. e.g (technically these gifs are from an earlier epoch but the output barely changed after training for longer):

RunWithTorch

Compared to setting motion=0.0:

RunWithTorchNoMotion

Maybe it is just really subtle, but I can barely see a difference.

I think this is the same animation 1:26 in https://youtu.be/6HxXLBzRXFg?t=86 so there's clearly a pretty huge gap between the results I'm getting and what was obtained for that video.

On that 100th epoch, I got the following metrics:

m/Loss: 1.0557 - m/Stretch: 6.5842e-06 - m/Shear: 0.1285 - m/Bending: 0.1992 - m/Collision: 0.0041 - m/Gravity: 1.0049 - m/Inertia: 0.0049

FYI after just 5 batches (not epochs!) of training, the metrics were:

m/Loss: 1.6685 - m/Stretch: 2.6247e-05 - m/Shear: 0.2460 - m/Bending: 0.0590 - m/Collision: 0.0359 - m/Gravity: 1.0188 - m/Inertia: 0.0046

So Stretch, Shear, Bending, Collision and Gravity (to a lesser extent) losses all improved quite a bit, but it looks like inertia loss barely changed (if anything, it got worse?). Perhaps inertia loss starting off so tiny could be the cause for the model seemingly not learning dynamics?

I also tried a very un-scientific test: printing the mean and max magnitude of values in the dynamic and static encodings (just before running the decoder):

x_static_abs = tf.abs(x_static)
x_dynamic_abs = tf.abs(x_dynamic)
for i in range(tf_shape(x_static)[1]):
  x_static_slice = x_static_abs[:, i, :]
  x_dynamic_slice = x_dynamic_abs[:, i, :]
  x_static_mean = tf.math.reduce_mean(x_static_slice)
  x_static_max = tf.math.reduce_max(x_static_slice)
  x_dynamic_mean = tf.math.reduce_mean(x_dynamic_slice)
  x_dynamic_max = tf.math.reduce_max(x_dynamic_slice)
  print(f"Means: Static - {x_static_mean}, Dynamic - {x_dynamic_mean}\nMaximums: Static - {x_static_max}, Dynamic - {x_dynamic_max}")

and got (for the last frame on the above animation):

Means: Static - 0.015779726207256317, Dynamic - 0.013485459610819817
Maximums: Static - 0.14864373207092285, Dynamic - 0.062263891100883484

Obviously average/maximum magnitude across values in the encodings won't always entirely correlate with size of output deformations, but at least it looks like the dynamic encoder is having some influence on the final output, just not anything that resembles coherent cloth dynamics.

My main question then: am I doing anything obviously wrong? I guess the most obvious thing left to try is to train for more epochs, but the paper did mention that simple garments should only take an hour to train (max a day). Training for 100 epochs took about a day on a 4070 and losses seem to be decreasing very slowly. Regardless, I will try running training over a few more days and update this issue if I managed to get a better result... Any other ideas for what I might be doing wrong (do I need to be training with a larger batch size?) or other assistance in general (perhaps sharing the exact config/set of Mixamo animations used to train the model shown in the paper?) would be very much appreciated. Thanks!!

NathanielB123 avatar May 28 '24 17:05 NathanielB123