i-pi-dev_archive icon indicating copy to clipboard operation
i-pi-dev_archive copied to clipboard

Problem with timeout in the socket.py (signaled from Vandenbrande)

Open grhawk opened this issue 8 years ago • 11 comments

The time duration of 1 second I was talking about, is the time the force evaluation takes using our driver. I just mentioned this to stress that this is a lot less than a typical ab initio force evaluation.

If we call the time duration of the force evaluation 'tau', then one simulation step with 'nbeads' using one driver instance (single core) with i-PI takes just slightly longer than

tau x nbeads

This is exactly what we want, it indicates that the overhead (both calculation and communication) of i-PI is still small compared to the force evaluation. So far, so good. The problem arises when we use multiple instances of the driver. In case we are launching one driver on each processor, we would expect (in an ideal case) that one simulation step now takes:

tau x nbeads / nprocs

What I noticed, is that the walltime remains constant independent of the number of processors! I then noticed that by changing 'TIMEOUT = 5.0' at the top of the file ipi/interfaces/sockets.py, a serious speed-up is obtained. I only studied the code briefly and I do not understand this behavior. But all examples I have run indicate that setting 'TIMEOUT = 5.0' results in driver instances blocking most of the time.

I hope the problem is clear now.

grhawk avatar May 12 '16 06:05 grhawk

Anyway the expected timing should be

tau x (floor(nbeads/nprocs) + 1)

grhawk avatar May 12 '16 06:05 grhawk

let's try to fix this in the "timings" branch. ideally before the school.

ceriottm avatar May 29 '16 19:05 ceriottm

Test performed on 16 beads (actually I have more test but these are the most meaningful), on a 16cpu xeon machine.

IPI 1.0 RESULTS

ipi_results

IPI-DEV RESULTS

ipidev_results

I think there is not need for comments... :)

grhawk avatar Jun 06 '16 15:06 grhawk

:+1:

tomspur avatar Jun 06 '16 16:06 tomspur

Awesome, so we can mark this as invalid in i-pi-dev and (I'd say) wontfix in i-pi 1.0, that soon should be deprecated.

On 6 June 2016 at 18:07, Thomas Spura [email protected] wrote:

👍

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceriottm/i-pi-dev/issues/112#issuecomment-224006084, or mute the thread https://github.com/notifications/unsubscribe/ABESZ64ySTuWopCJzVd5TCWfv5iijdssks5qJEXOgaJpZM4Icw9Q .

ceriottm avatar Jun 06 '16 18:06 ceriottm

In general I agree... could be possible to simply replace the i-pi 1.0 sockets.py with the new one? This could be a fast fix that maybe could worth a try...

grhawk avatar Jun 06 '16 19:06 grhawk

My timings for a test with 16 beads:

IPI 1.0 RESULTS

beads_16

IPI-DEV RESULTS (master branch)

beads_16

Clearly the scaling is improved a lot using the i-pi-dev code. Yet, there is still a dependency on the value of the socket time-outs. Notably setting the time-out to 0.5s is systematically the worst. Interestingly, this value is close to the time it takes the driver to evaluate the forces, 0.7s. For the time-out=0.5s I still notice bad load balancing, i.e. some driver instances doing significantly more work than others. @grhawk How long does one force evaluation take for the results you show?

Thanks for taking the time to investigate this problem!

stevenvdb avatar Jun 07 '16 13:06 stevenvdb

@stevenvdb A step in my simulation takes ~1.3s. I unfortunately used a qm driver: the step speed change a little bit during the simulation.

I don't see the same behavior in my timings... The 1.25 is a little bit slower for 2 and 4 drivers, but then it become comparable with the others... how did you measure the times? I used a simple and rough python time.time starting together with the drivers and stopping as soon as the work was completed (so it take also the time for ipi to exit, but this should be the same in all the tests).

How many steps you did in your tests? How did you measure the time? Could you try using a slightly slower force evaluation?

Thank you for your feedbacks

grhawk avatar Jun 07 '16 17:06 grhawk

hello guys, thanks to both for all the efforts in debugging this. having figured out that the most dramatic slow-down has been somehow solved in the i-pi1->i-pi-dev evolution (actually I recall doing something to fix a problem with load balancing, but cannot recall - perhaps the history could help us if it was not so sloppily kept!) I'd say that if we want to make this more efficient it should not be regarded as a bugfix, but as a code optimization -- which means that first thing all of us have to understand deeply how the socket machinery works. to be sincere I'd say there are more urgent things to focus on, but if this is critical for @stevendb, of course we will help with optimization as much as we can.

On 7 June 2016 at 19:29, Riccardo Petraglia [email protected] wrote:

@stevenvdb https://github.com/stevenvdb A step in my simulation takes ~1.3s. I unfortunately used a qm driver: the step speed change a little bit during the simulation.

I don't see the same behavior in my timings... The 1.25 is a little bit slower for 2 and 4 drivers, but then it become comparable with the others... how did you measure the times? I used a simple and rough python time.time starting together with the drivers and stopping as soon as the work was completed (so it take also the time for ipi to exit, but this should be the same in all the tests).

How many steps you did in your tests? How did you measure the time? Could you try using a slightly slower force evaluation?

Thank you for your feedbacks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceriottm/i-pi-dev/issues/112#issuecomment-224353734, or mute the thread https://github.com/notifications/unsubscribe/ABESZz2acEALc6gZeZmfsMywkduRUMUhks5qJaplgaJpZM4Icw9Q .

ceriottm avatar Jun 07 '16 19:06 ceriottm

I'd say that if we want to make this more efficient it should not be regarded as a bugfix, but as a code optimization

I agree. The i-pi-dev code works fine in the large majority of use cases. Besides, I can still manually adjust the time out value if really necessary.

I don't see the same behavior in my timings... The 1.25 is a little bit slower for 2 and 4 drivers, but then it become comparable with the others...

Could you maybe also plot the speedup (ie time(1 driver)/time(n drivers))? The scale of your plot makes it look like all curves converge for 16 drivers, but plotting the speedup can show more subtle differences.

how did you measure the times? I used a simple and rough python time.time starting together with the drivers and stopping as soon as the work was completed (so it take also the time for ipi to exit, but this should be the same in all the tests).

I use python datetime.now() and measure between the driver sending the first header and receiving the very last, then report the time for slowest driver instance (but they are very nearly equal).

How many steps you did in your tests?

50 steps, should be enough to eliminate influence of overhead related to starting up and closing down.

Could you try using a slightly slower force evaluation?

The next plot is for a force evaluation that takes about 2s. Again there is still a small dependence on the time out value. beads_16

stevenvdb avatar Jun 09 '16 08:06 stevenvdb

This has been sitting around for ages - I mark it for a v2.0 milestone but unless @stevenvdb wants to pick up the task, I am inclined to close the issue as IMO "fixing" this would imply a deep rewriting of the code in which efficiency is given more focus over code clarity

ceriottm avatar Mar 16 '17 08:03 ceriottm