LPATHBench icon indicating copy to clipboard operation
LPATHBench copied to clipboard

C implementation for addition to benchmark

Open nkurz opened this issue 10 years ago • 5 comments

I offer a version written in C for inclusion in the benchmark: https://gist.github.com/nkurz/5e49ba0ddb04e23de03f

I've only tested on Haswell under Linux, but results are good when compiled with gcc-4.8 -O2: // 8981 LANGUAGE C 623 // 8981 LANGUAGE C++/clang 734 // 8981 LANGUAGE C++/gcc 755

I have tested, but I think the implementation should continue to perform well with larger graph sizes.

nkurz avatar Dec 22 '14 23:12 nkurz

Cool, I've added it to the makefile and run script. I found the Use Highbit one to be fastest.

logicchains avatar Dec 23 '14 11:12 logicchains

It might be interesting to add the following x64 benchmarks to the writeup. These were done on a Haswell i7-4770 CPU @ 3.40GHz (Turbo Off, Hyperthreading Off) running Linux 3.13, which I think is a representative current generation desktop processor.

nate@haswell:~/git/LPATHBench$ make run-all for prog in "fs.exe" "./cpp_gcc" "./cpp_clang" "./cpp_cached" "java jv" "./ml" "./lisp" "./rs" "./rs_unsafe" "./go" "./gccgo" "./d-dmd" "./nim" "luajit lj.lua" "./f03" "./c-highbit" "./c-bitmap" "./c-branchless" ; do $prog; done

8981 LANGUAGE C++Cached 18 8981 LANGUAGE C/branchless 618 8981 LANGUAGE C/highbit 749 8981 LANGUAGE C++/gcc 755 8981 LANGUAGE C++/clang 735 8981 LANGUAGE Nim 756 8981 LANGUAGE Rust 877 8981 LANGUAGE RustUnsafe 869 8981 LANGUAGE C/bitmap 899 8981 LANGUAGE Java 909 8981 LANGUAGE Fortran_2003 986 8981 LANGUAGE Ocaml 997 8981 LANGUAGE Lisp 1004 8981 LANGUAGE Go 1021 8981 LANGUAGE FSharp 1088 8981 LANGUAGE D 1203 8981 LANGUAGE GCCGo 1502 8981 LANGUAGE LuaJit 2088

My main conclusions would be:

  1. Modern processors have improved more than clockspeed alone might imply.
  2. Implementation and algorithm probably matter more than choice of language.

I feel the C/branchless implementation I did is well optimized for this processor. My guess would be that assembly tweaks might get it down to 500 ms, but probably not much more. It's fairly incredible to me that the much higher level languages are able to stay as close to this speed as they do.

nkurz avatar Dec 26 '14 22:12 nkurz

I added them. I don't suppose you have a different D compiler handy? Dmd is generally the slowest by a significant margin, so it's not the fairest for benchmarking

While language doesn't seem to matter, note that fast code was much easier to write in D/C++/Rust than in the managed languages. C# for instance required changing a foreach-style loop to a for(i = 0; i< len; i++) style loop, as C# iterators aren't cost-free. Java required changing the node class to an []int, to avoid boxing. Common Lisp required adding many type annotations and it wasn't initially obvious which to add.

Still pretty impressive how fast the higher-level languages are though.

On 27 December 2014 at 09:01, nkurz [email protected] wrote:

It might be interesting to add the following x64 benchmarks to the writeup. These were done on a Haswell i7-4770 CPU @ 3.40GHz (Turbo Off, Hyperthreading Off) running Linux 3.13, which I think is a representative current generation desktop processor.

nate@haswell:~/git/LPATHBench$ make run-all for prog in "fs.exe" "./cpp_gcc" "./cpp_clang" "./cpp_cached" "java jv" "./ml" "./lisp" "./rs" "./rs_unsafe" "./go" "./gccgo" "./d-dmd" "./nim" "luajit lj.lua" "./f03" "./c-highbit" "./c-bitmap" "./c-branchless" ; do $prog; done

8981 LANGUAGE C++Cached 18 8981 LANGUAGE C/branchless 618 8981 LANGUAGE C/highbit 749 8981 LANGUAGE C++/gcc 755 8981 LANGUAGE C++/clang 735 8981 LANGUAGE Nim 756 8981 LANGUAGE Rust 877 8981 LANGUAGE RustUnsafe 869 8981 LANGUAGE C/bitmap 899 8981 LANGUAGE Java 909 8981 LANGUAGE Fortran_2003 986 8981 LANGUAGE Ocaml 997 8981 LANGUAGE Lisp 1004 8981 LANGUAGE Go 1021 8981 LANGUAGE FSharp 1088 8981 LANGUAGE D 1203 8981 LANGUAGE GCCGo 1502 8981 LANGUAGE LuaJit 2088

My main conclusions would be:

  1. Modern processors have improved more than clockspeed alone might imply.
  2. Implementation and algorithm probably matter more than choice of language.

I feel the C/branchless implementation I did is well optimized for this processor. My guess would be that assembly tweaks might get it down to 500 ms, but probably not much more. It's fairly incredible to me that the much higher level languages are able to stay as close to this speed as they do.

— Reply to this email directly or view it on GitHub https://github.com/logicchains/LPATHBench/issues/53#issuecomment-68160081 .

logicchains avatar Dec 27 '14 11:12 logicchains

While language doesn't seem to matter, note that fast code was much easier to write in D/C++/Rust than in the managed languages.

I'm not sure. I'd guess there are more ways to do have poor performance in each language than there are to do it fast, and it will always be a debate as to which approach is most idiomatic. Also, I think that choice of which is fastest will depend on the target architecture and change with compiler/interpreter versions. It would be interesting to have many versions written in each language to show the spread of performance within that language for plausible implementations, and then plot with the overlaps.

I don't suppose you have a different D compiler handy? Dmd is generally the slowest by a significant margin, so it's not the fairest for benchmarking

I'm installing compilers to try this, and can easily try to install others. I started with gdc-4.8, but failed with the following error:

gdc d.d -o d-ddc -O3 -frelease -finline -fno-bounds-check
d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.algorithm.splitter!(string, string).splitter'
d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.array.array!(Result).array'

I didn't know if this was a version problem, something easy to change in the source, or a known incompatibility. I can try again if you suggest a fix.

But just tried ldc2, and indeed the results are among the fastest

8981 LANGUAGE D 1201  // dmd DMD64 D Compiler v2.066.1
8981 LANGUAGE D 739 // ldc2 0.15.1 (DMD v2.066.1, LLVM 3.5.0)

While I've got you, one more language I'm unfamiliar with but had errors was Julia.

nate@haswell:~/git/LPATHBench$ julia -v
julia version 0.2.1
nate@haswell:~/git/LPATHBench$ julia julia.jl
ERROR: no method Node(Array{None,1},)
 in readPlaces at /home/nate/git/LPATHBench/julia.jl:16
 in open at io.jl:342
 in main at /home/nate/git/LPATHBench/julia.jl:45
 in include at boot.jl:238
 at /home/nate/git/LPATHBench/julia.jl:55

Do you know if this means I should move up or down in version? Or if there is an easy fix?

nkurz avatar Dec 27 '14 18:12 nkurz

It would be interesting to have many versions written in each language to show the spread of performance within that language for plausible implementations, and then plot with the overlaps.

With the next benchmark I might take a more functional approach like this, leaving older versions in rather than replacing them. The issue would be deciding which versions were worth keeping, and how to give them meaningful names so that readers could distinguish between two different, similar implementations in the same language.

That gdc issue could be fixed either by updating gdc, or by removing all 'pure' annotations from the code. Presumably 4.8 has different, impure implementations of a couple of functions that are marked pure in more recent versions, so it's not possible to mark a function calling those functions pure in 4.8.

I don't have much experience with Julia either; someone else submitted that implementation. So the only way to fix it would probably be to upgrade; I'm currently using version 0.3.3, which I believe has had a fair few changes since 0.2.1.

On 28 December 2014 at 05:39, nkurz [email protected] wrote:

While language doesn't seem to matter, note that fast code was much easier to write in D/C++/Rust than in the managed languages.

I'm not sure. I'd guess there are more ways to do have poor performance in each language than there are to do it fast, and it will always be a debate as to which approach is most idiomatic. Also, I think that choice of which is fastest will depend on the target architecture and change with compiler/interpreter versions. It would be interesting to have many versions written in each language to show the spread of performance within that language for plausible implementations, and then plot with the overlaps.

I don't suppose you have a different D compiler handy? Dmd is generally the slowest by a significant margin, so it's not the fairest for benchmarking

I'm installing compilers to try this, and can easily try to install others. I started with gdc-4.8, but failed with the following error:

gdc d.d -o d-ddc -O3 -frelease -finline -fno-bounds-check d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.algorithm.splitter!(string, string).splitter' d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.array.array!(Result).array'

I didn't know if this was a version problem, something easy to change in the source, or a known incompatibility. I can try again if you suggest a fix.

But just tried ldc2, and indeed the results are among the fastest

8981 LANGUAGE D 1201 // dmd DMD64 D Compiler v2.066.1 8981 LANGUAGE D 739 // ldc2 0.15.1 (DMD v2.066.1, LLVM 3.5.0)

While I've got you, one more language I'm unfamiliar with but had errors was Julia.

nate@haswell:~/git/LPATHBench$ julia -v julia version 0.2.1 nate@haswell:~/git/LPATHBench$ julia julia.jl ERROR: no method Node(Array{None,1},) in readPlaces at /home/nate/git/LPATHBench/julia.jl:16 in open at io.jl:342 in main at /home/nate/git/LPATHBench/julia.jl:45 in include at boot.jl:238 at /home/nate/git/LPATHBench/julia.jl:55

Do you know if this means I should move up or down in version? Or if there is an easy fix?

— Reply to this email directly or view it on GitHub https://github.com/logicchains/LPATHBench/issues/53#issuecomment-68186445 .

logicchains avatar Dec 28 '14 09:12 logicchains