i-pi-dev_archive
i-pi-dev_archive copied to clipboard
Problem with timeout in the socket.py (signaled from Vandenbrande)
The time duration of 1 second I was talking about, is the time the force evaluation takes using our driver. I just mentioned this to stress that this is a lot less than a typical ab initio force evaluation.
If we call the time duration of the force evaluation 'tau', then one simulation step with 'nbeads' using one driver instance (single core) with i-PI takes just slightly longer than
tau x nbeads
This is exactly what we want, it indicates that the overhead (both calculation and communication) of i-PI is still small compared to the force evaluation. So far, so good. The problem arises when we use multiple instances of the driver. In case we are launching one driver on each processor, we would expect (in an ideal case) that one simulation step now takes:
tau x nbeads / nprocs
What I noticed, is that the walltime remains constant independent of the number of processors! I then noticed that by changing 'TIMEOUT = 5.0' at the top of the file ipi/interfaces/sockets.py, a serious speed-up is obtained. I only studied the code briefly and I do not understand this behavior. But all examples I have run indicate that setting 'TIMEOUT = 5.0' results in driver instances blocking most of the time.
I hope the problem is clear now.
Anyway the expected timing should be
tau x (floor(nbeads/nprocs) + 1)
let's try to fix this in the "timings" branch. ideally before the school.
Test performed on 16 beads (actually I have more test but these are the most meaningful), on a 16cpu xeon machine.
IPI 1.0 RESULTS
IPI-DEV RESULTS
I think there is not need for comments... :)
:+1:
Awesome, so we can mark this as invalid in i-pi-dev and (I'd say) wontfix in i-pi 1.0, that soon should be deprecated.
On 6 June 2016 at 18:07, Thomas Spura [email protected] wrote:
👍
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceriottm/i-pi-dev/issues/112#issuecomment-224006084, or mute the thread https://github.com/notifications/unsubscribe/ABESZ64ySTuWopCJzVd5TCWfv5iijdssks5qJEXOgaJpZM4Icw9Q .
In general I agree... could be possible to simply replace the i-pi 1.0 sockets.py with the new one? This could be a fast fix that maybe could worth a try...
My timings for a test with 16 beads:
IPI 1.0 RESULTS
IPI-DEV RESULTS (master branch)
Clearly the scaling is improved a lot using the i-pi-dev code. Yet, there is still a dependency on the value of the socket time-outs. Notably setting the time-out to 0.5s is systematically the worst. Interestingly, this value is close to the time it takes the driver to evaluate the forces, 0.7s. For the time-out=0.5s I still notice bad load balancing, i.e. some driver instances doing significantly more work than others. @grhawk How long does one force evaluation take for the results you show?
Thanks for taking the time to investigate this problem!
@stevenvdb A step in my simulation takes ~1.3s. I unfortunately used a qm driver: the step speed change a little bit during the simulation.
I don't see the same behavior in my timings... The 1.25 is a little bit slower for 2 and 4 drivers, but then it become comparable with the others... how did you measure the times? I used a simple and rough python time.time starting together with the drivers and stopping as soon as the work was completed (so it take also the time for ipi to exit, but this should be the same in all the tests).
How many steps you did in your tests? How did you measure the time? Could you try using a slightly slower force evaluation?
Thank you for your feedbacks
hello guys, thanks to both for all the efforts in debugging this. having figured out that the most dramatic slow-down has been somehow solved in the i-pi1->i-pi-dev evolution (actually I recall doing something to fix a problem with load balancing, but cannot recall - perhaps the history could help us if it was not so sloppily kept!) I'd say that if we want to make this more efficient it should not be regarded as a bugfix, but as a code optimization -- which means that first thing all of us have to understand deeply how the socket machinery works. to be sincere I'd say there are more urgent things to focus on, but if this is critical for @stevendb, of course we will help with optimization as much as we can.
On 7 June 2016 at 19:29, Riccardo Petraglia [email protected] wrote:
@stevenvdb https://github.com/stevenvdb A step in my simulation takes ~1.3s. I unfortunately used a qm driver: the step speed change a little bit during the simulation.
I don't see the same behavior in my timings... The 1.25 is a little bit slower for 2 and 4 drivers, but then it become comparable with the others... how did you measure the times? I used a simple and rough python time.time starting together with the drivers and stopping as soon as the work was completed (so it take also the time for ipi to exit, but this should be the same in all the tests).
How many steps you did in your tests? How did you measure the time? Could you try using a slightly slower force evaluation?
Thank you for your feedbacks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceriottm/i-pi-dev/issues/112#issuecomment-224353734, or mute the thread https://github.com/notifications/unsubscribe/ABESZz2acEALc6gZeZmfsMywkduRUMUhks5qJaplgaJpZM4Icw9Q .
I'd say that if we want to make this more efficient it should not be regarded as a bugfix, but as a code optimization
I agree. The i-pi-dev code works fine in the large majority of use cases. Besides, I can still manually adjust the time out value if really necessary.
I don't see the same behavior in my timings... The 1.25 is a little bit slower for 2 and 4 drivers, but then it become comparable with the others...
Could you maybe also plot the speedup (ie time(1 driver)/time(n drivers))? The scale of your plot makes it look like all curves converge for 16 drivers, but plotting the speedup can show more subtle differences.
how did you measure the times? I used a simple and rough python time.time starting together with the drivers and stopping as soon as the work was completed (so it take also the time for ipi to exit, but this should be the same in all the tests).
I use python datetime.now() and measure between the driver sending the first header and receiving the very last, then report the time for slowest driver instance (but they are very nearly equal).
How many steps you did in your tests?
50 steps, should be enough to eliminate influence of overhead related to starting up and closing down.
Could you try using a slightly slower force evaluation?
The next plot is for a force evaluation that takes about 2s. Again there is still a small dependence on the time out value.
This has been sitting around for ages - I mark it for a v2.0 milestone but unless @stevenvdb wants to pick up the task, I am inclined to close the issue as IMO "fixing" this would imply a deep rewriting of the code in which efficiency is given more focus over code clarity