stylegan-v
stylegan-v copied to clipboard
Large GPU memory consumption at the beginning of training
Hi, thanks for the great work!
I run the code using 8 A100 GPU cards and find that the gpu memory consumption is extremely large at the first several ticks! Here is the output log:
tick 0 kimg 0.2 time 1m 31s sec/tick 21.8 sec/kimg 113.36 maintenance 69.7 cpumem 4.70 gpumem 67.21 augment 0.000
Evaluating metrics for 3sky_timelapse_256_stylegan-v_random3_max32_3-4468dd1 ...
{"results": {"fvd2048_16f": 992.2131880075198}, "metric": "fvd2048_16f", "total_time": 80.59011363983154, "total_time_str": "1m 21s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261186.965401}
{"results": {"fvd2048_128f": 1764.0538105755193}, "metric": "fvd2048_128f", "total_time": 230.15506172180176, "total_time_str": "3m 50s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261417.1899228}
{"results": {"fvd2048_128f_subsample8f": 1241.4737946211158}, "metric": "fvd2048_128f_subsample8f", "total_time": 54.82384514808655, "total_time_str": "55s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261472.0662923}
{"results": {"fid50k_full": 381.6109859044359}, "metric": "fid50k_full", "total_time": 83.23335003852844, "total_time_str": "1m 23s", "num_gpus": 8, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1655261555.3598747}
tick 1 kimg 5.4 time 11m 28s sec/tick 23.0 sec/kimg 4.43 maintenance 573.9 cpumem 12.42 gpumem 69.14 augment 0.000
tick 2 kimg 10.6 time 11m 50s sec/tick 21.4 sec/kimg 4.13 maintenance 0.0 cpumem 12.42 gpumem 10.26 augment 0.000
tick 3 kimg 15.7 time 12m 11s sec/tick 21.6 sec/kimg 4.17 maintenance 0.0 cpumem 12.42 gpumem 10.26 augment 0.000
tick 4 kimg 20.9 time 12m 33s sec/tick 21.4 sec/kimg 4.12 maintenance 0.0 cpumem 12.42 gpumem 10.26 augment 0.003
tick 5 kimg 26.1 time 12m 55s sec/tick 21.7 sec/kimg 4.18 maintenance 0.0 cpumem 12.42 gpumem 10.29 augment 0.010
tick 6 kimg 31.3 time 13m 16s sec/tick 21.9 sec/kimg 4.22 maintenance 0.0 cpumem 12.42 gpumem 10.29 augment 0.026
tick 7 kimg 36.5 time 13m 39s sec/tick 22.4 sec/kimg 4.32 maintenance 0.0 cpumem 12.42 gpumem 10.33 augment 0.038
tick 8 kimg 41.7 time 14m 00s sec/tick 21.6 sec/kimg 4.16 maintenance 0.1 cpumem 12.42 gpumem 10.33 augment 0.036
tick 9 kimg 46.8 time 14m 23s sec/tick 22.3 sec/kimg 4.30 maintenance 0.1 cpumem 12.42 gpumem 10.32 augment 0.038
tick 10 kimg 52.0 time 14m 44s sec/tick 21.4 sec/kimg 4.13 maintenance 0.0 cpumem 12.42 gpumem 10.33 augment 0.028
As you can see, gpumem
in the first two ticks is abnormal. Do you have any idea about this problem?
I am experiencing the same problem. For a single forward pass, it consumes 46GB VRAM. Inference takes less than 8GB. What is the solution to this?