torchani icon indicating copy to clipboard operation
torchani copied to clipboard

[WIP] CUAEV aev constants memory optimization

Open yueyericardo opened this issue 3 years ago • 1 comments

before:

RUN                        Total AEV    Forward      Backward     Force        Optimizer    Others       Epoch time   GPU
0 cu Energy train          21.9 ms      13.1 ms      15.5 ms      0.0 ms       11.0 ms      3.6 ms       65.2 ms      1315.5MB
1 py Energy train          80.7 ms      12.9 ms      16.5 ms      0.0 ms       13.0 ms      3.7 ms       126.8 ms     2123.5MB
2 cu Energy + Force train  21.7 ms      12.9 ms      131.8 ms     75.2 ms      6.7 ms       5.0 ms       253.3 ms     2761.5MB
3 py Energy + Force train  80.2 ms      12.9 ms      435.0 ms     167.4 ms     6.9 ms       5.4 ms       707.8 ms     5667.5MB

after

RUN                        Total AEV    Forward      Backward     Force        Optimizer    Others       Epoch time   GPU
0 cu Energy train          21.4 ms      12.6 ms      15.6 ms      0.0 ms       10.0 ms      3.3 ms       63.0 ms      1321.5MB
1 py Energy train          80.2 ms      12.6 ms      16.1 ms      0.0 ms       12.4 ms      3.7 ms       125.1 ms     2129.5MB
2 cu Energy + Force train  21.3 ms      12.6 ms      125.7 ms     73.9 ms      6.4 ms       5.4 ms       245.2 ms     2769.5MB
3 py Energy + Force train  79.5 ms      12.5 ms      433.0 ms     166.3 ms     6.6 ms       5.6 ms       703.5 ms     5673.5MB

yueyericardo avatar Mar 04 '21 22:03 yueyericardo

After using share mem for ShfZ & ShfA

RUN                        Total AEV    Forward      Backward     Force        Optimizer    Others       Epoch time   GPU
0 cu Energy train          21.4 ms      12.7 ms      15.6 ms      0.0 ms       11.1 ms      3.4 ms       64.1 ms      1321.5MB
1 py Energy train          80.3 ms      12.5 ms      15.9 ms      0.0 ms       13.7 ms      3.6 ms       126.0 ms     2129.5MB
2 cu Energy + Force train  21.2 ms      12.6 ms      122.9 ms     72.1 ms      7.6 ms       5.0 ms       241.4 ms     2769.5MB
3 py Energy + Force train  79.8 ms      12.5 ms      433.0 ms     166.5 ms     8.0 ms       5.6 ms       705.4 ms     5673.5MB

yueyericardo avatar Mar 05 '21 04:03 yueyericardo