torchani
torchani copied to clipboard
[WIP] CUAEV aev constants memory optimization
before:
RUN Total AEV Forward Backward Force Optimizer Others Epoch time GPU
0 cu Energy train 21.9 ms 13.1 ms 15.5 ms 0.0 ms 11.0 ms 3.6 ms 65.2 ms 1315.5MB
1 py Energy train 80.7 ms 12.9 ms 16.5 ms 0.0 ms 13.0 ms 3.7 ms 126.8 ms 2123.5MB
2 cu Energy + Force train 21.7 ms 12.9 ms 131.8 ms 75.2 ms 6.7 ms 5.0 ms 253.3 ms 2761.5MB
3 py Energy + Force train 80.2 ms 12.9 ms 435.0 ms 167.4 ms 6.9 ms 5.4 ms 707.8 ms 5667.5MB
after
RUN Total AEV Forward Backward Force Optimizer Others Epoch time GPU
0 cu Energy train 21.4 ms 12.6 ms 15.6 ms 0.0 ms 10.0 ms 3.3 ms 63.0 ms 1321.5MB
1 py Energy train 80.2 ms 12.6 ms 16.1 ms 0.0 ms 12.4 ms 3.7 ms 125.1 ms 2129.5MB
2 cu Energy + Force train 21.3 ms 12.6 ms 125.7 ms 73.9 ms 6.4 ms 5.4 ms 245.2 ms 2769.5MB
3 py Energy + Force train 79.5 ms 12.5 ms 433.0 ms 166.3 ms 6.6 ms 5.6 ms 703.5 ms 5673.5MB
After using share mem for ShfZ & ShfA
RUN Total AEV Forward Backward Force Optimizer Others Epoch time GPU
0 cu Energy train 21.4 ms 12.7 ms 15.6 ms 0.0 ms 11.1 ms 3.4 ms 64.1 ms 1321.5MB
1 py Energy train 80.3 ms 12.5 ms 15.9 ms 0.0 ms 13.7 ms 3.6 ms 126.0 ms 2129.5MB
2 cu Energy + Force train 21.2 ms 12.6 ms 122.9 ms 72.1 ms 7.6 ms 5.0 ms 241.4 ms 2769.5MB
3 py Energy + Force train 79.8 ms 12.5 ms 433.0 ms 166.5 ms 8.0 ms 5.6 ms 705.4 ms 5673.5MB