Performance issues for large AutodiffCompositions
I'm encountering performance issues in the first call to the learn function of an AutodiffComposition. The performance of the first call to the learn method seems to be highly dependent on the size of the mechanisms in the composition, scaling super-linearly. Consider this code snippet, adapted from the documentation:
print(f's1={s1},s2={s2}')
start_t = time.time()
my_mech_1 = pnl.TransferMechanism(function=pnl.Linear, size = s1)
my_mech_2 = pnl.TransferMechanism(function=pnl.Linear, size = s2)
my_projection = pnl.MappingProjection(matrix=np.random.randn(s1,s2),
sender=my_mech_1,
receiver=my_mech_2)
# Create AutodiffComposition
my_autodiff = pnl.AutodiffComposition()
my_autodiff.add_node(my_mech_1)
my_autodiff.add_node(my_mech_2)
my_autodiff.add_projection(sender=my_mech_1, projection=my_projection, receiver=my_mech_2)
print(f' Time to create: {time.time()-start_t}')
in_patterns = np.random.binomial(1,.1,size=(10,s1))
out_patterns = np.random.binomial(1,.2,size=(10,s2))
start_t = time.time()
k = my_autodiff.learn(inputs = {'inputs':{my_mech_1:in_patterns},
'targets':{my_mech_2:out_patterns}})
print(f' First run\'s time: {time.time()-start_t}')
start_t = time.time()
k = my_autodiff.learn(inputs = {'inputs':{my_mech_1:in_patterns},
'targets':{my_mech_2:out_patterns}})
print(f' Second run\'s time: {time.time()-start_t}')
I ran this code for different values of s1 and s2, yielding this:
| s1 | s1 | time to create composition | time for first call to learn | time for second call to learn |
|---|---|---|---|---|
| 1 | 2 | .5 sec | 1 sec | .03 sec |
| 10 | 20 | .5 sec | 1 sec | .03 sec |
| 100 | 200 | .8 sec | 8 sec | .03 sec |
| 200 | 400 | 1.8 sec | 23 sec | .03 sec |
| 200 | 800 | 3.4 sec | 85 sec | .03 sec |
I also ran s1=200,s2=1600. At this point, the first call to learn took at least 5 minutes, at which point I stopped execution because Python had allocated 3.5 Gb of memory.
Thanks for the report. This is addressed in #1963, which in my case brings the numbers down to
| s1 | s2 | time to create composition | time for first call to learn | time for second call to learn |
|---|---|---|---|---|
| 200 | 800 | 0.7 sec | 1.6 sec | .03 sec |
| 400 | 1600 | 0.70 sec | 2.4 sec | .04 sec |
| 400 | 3200 | 0.78 sec | 9.1 sec | .07 sec |
| 800 | 3200 | 0.98 sec | 15.8 sec | .1 sec |
I'm not sure what the expected scaling of learn is, though, and any test of 100/4800+ is killed by the os, possibly due to memory.