ffi-overhead icon indicating copy to clipboard operation
ffi-overhead copied to clipboard

Unroll the loops?

Open munificent opened this issue 6 years ago • 4 comments

Since you benchmark the entire program elapsed time, the loop itself — evaluating the exit condition, looking up the variable, and jumping back or out — adds to the execution time. The body of the loop is tiny, just a single FFI call, so the loop overhead is very likely significant, especially in the interpreted languages.

A simple fix is to unroll the loop several times. Maybe something like (here's the Wren one):

class FFI {
    foreign static plusone(x)
    foreign static count()
    foreign static start()
    foreign static stop()
}
var x = 0
var count = FFI.count()
// try call
FFI.plusone(x)

FFI.start()
while (x < count) {
    x = FFI.plusone(FFI.plusone(FFI.plusone(FFI.plusone(FFI.plusone(x)))))
}
FFI.stop()

My hunch is that you'll see quite different relative numbers if you do this for all the languages. Right now, your benchmarks probably show the relative performance of assigning variables and looping more than they do FFI.

munificent avatar May 29 '18 16:05 munificent

Hey @munificent Good idea. That can be implemented as a flag from extra cli args. Maybe having 10 ffi calls per loop might be better? (easier to digest)

I like wren btw :-)

dyu avatar May 30 '18 01:05 dyu

Maybe having 10 ffi calls per loop might be better? (easier to digest)

You mean like:

while (x < count) {
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
}

?

That works too, though you get a little extra overhead from the assignment and variable lookup. Normally, this kind of stuff doesn't matter much, but since FFIs are generally pretty fast, this kind of tiny code can actually dwarf it.

I like wren btw :-)

Thanks, me too! 😉

munificent avatar May 30 '18 18:05 munificent

Well, more like your original suggestion (chaining) but with 10 calls instead of 5.

while (x < count) {
    x = FFI.plusone(
        FFI.plusone(
            FFI.plusone(
                FFI.plusone(
                    FFI.plusone(
                        FFI.plusone(
                            FFI.plusone(
                                FFI.plusone(
                                    FFI.plusone(
                                        FFI.plusone(x)
                                    )
                                )
                            )
                        )
                    )
                )
            )
        )
    )
}

dyu avatar May 31 '18 00:05 dyu

That sounds good to me. :)

munificent avatar Jun 04 '18 17:06 munificent