D_vs_nim icon indicating copy to clipboard operation
D_vs_nim copied to clipboard

runtime performance

Open mratsim opened this issue 6 years ago • 5 comments

  • [ ] I don't think there much of a speed difference between D and Nim when GC'ed types are not involved for general code.

  • [ ] One perf gotcha is that by default Nim seq and strings have value semantics and assignment will deep copy.

I think performance all come down to data structure + programmer familiarity with the language + time spent.

Now there are domain specific considerations that can make a huge difference, and most Nim library authors publish extensive benchmarks of their solutions compared to their mainstream alternative

Generic need in the wild, parsing files

  • [x] There is the Faster Command Line Tool in <insert language> benchmark that was started by the D community, Nim also replicated it, TL;DR D and Nim had the same speed and same compilation time. To be honest the fastest CSV parser I used (to parse GBs of machine learning datasets) is XSV in Rust.

Domain specific

Http server:

  • [x] Mofuw by @2vg is faster than tokio-minihttp, the current #1 on TechEmpower benchmark.

Functional programming

  • [x] Zero_functional is currently number 1 or 2 against 9 other langs. The other number 2 or 1 lang being Rust. Zero_functional fuses loop at compile-time when chaining zip.map.filter.reduce functional constructs.

Numerical/scientific computing

This is my domain so I know much more about it.

  • [ ] D has the advantage of having access to register size and L1 cache or L2 cache size at compile-time when using LDC, this is important for truly generic code.

  • [ ] D does not have access to restrict and builtin_assume_aligned which is necessary to reach Fortran speed when operating on arrays and tensors.

  • [ ] D cannot disable (?) the GC at specific point.

Open questions

  • [ ] Does D has an alternative to closures that can inline proc passed to higher order functions like map?

  • [ ] Can D arrays be parametrized with compile-time proc? For example, for efficient parallel reduction you need to create an intermediate array of N elements (N your number of cores), it should be padded so that the elements do not sit in the same cache line (64B on all CPU) to avoid false sharing/cache invalidation. For a type T I need something like this var results{.align64, noInit.}: array[min(T.sizeof, OPENMP_NB_THREADS * maxItemsPerCacheLine), T]

mratsim avatar Mar 24 '18 12:03 mratsim

thanks! PR's welcome to incorporate your points so they don't get lost!

timotheecour avatar Mar 27 '18 20:03 timotheecour

added https://github.com/timotheecour/D_vs_nim/commit/208b717c49fe452672df5074eb597f71f2aac61f to address some points above (marking them as checked)

timotheecour avatar Mar 27 '18 20:03 timotheecour

/cc @mratsim

D cannot disable (?) the GC at specific point.

what do you mean? see https://dlang.org/library/core/memory/gc.disable.html

timotheecour avatar Mar 27 '18 20:03 timotheecour

@timotheecour it has some limitations though: Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

Yardanico avatar Mar 27 '18 20:03 Yardanico

@timotheecour hence the question mark ;) I don't know D.

By the way regarding: D has the advantage of having access to register size and L1 cache or L2 cache size at compile-time when using LDC. This was used in Mir, full article here. I have also seen L1 cache size dependent code in mir library.

mratsim avatar Mar 28 '18 11:03 mratsim