Nim closure iterators are much slower with ARC/ORC

closure iterators are much slower with ARC/ORC

Open hamidb80 opened this issue 1 year ago • 2 comments

What happened?

here I get 8x slow down when I switch to --mm:arc.

import std/sugar
import benchy

proc toIter*[T](s: Slice[T]): iterator: T =
  iterator it: T {.closure.} =
    for x in s.a..s.b:
      yield x
  return it

proc filter*[T](i: iterator: T, f: proc(x: T): bool): iterator: T =
  iterator it: T {.closure.} =
    for x in i():
      if f(x):
        yield x
  result = it

iterator filter*[T](i: iterator: T, f: proc(x: T): bool): T =
  for x in i():
    if f(x):
      yield x


timeIt "closure iterator":
  var acc = 0

  for i in (1..100_000).
    toIter.
    filter(x => x mod 2 == 0).
    filter(x => x mod 4 == 0).
    filter(x => x mod 8 == 0).
    filter(x => x mod 16 == 0).
    filter(x => x mod 32 == 0).
    filter(x => x mod 64 == 0).
    filter(x => x mod 128 == 0).
    filter(x => x mod 256 == 0).
    filter(x => x mod 512 == 0):

    acc.inc i

Nim Version

Nim Compiler Version 1.7.1 [Windows: amd64]
Compiled at 2022-07-17
Copyright (c) 2006-2022 by Andreas Rumpf

active boot switches: -d:release

Current Standard Output Logs

with ARC: nim --mm:arc -d:release r .\play.nim:

min time    avg time  std dv   runs name
4.022 ms    4.352 ms  ±0.207  x1000 closure iterator

with refC: nim -d:release r .\play.nim:

min time    avg time  std dv   runs name
0.841 ms    0.924 ms  ±0.045  x1000 closure iterator

Expected Standard Output Logs

almost the same numbers

Additional Information

the numbers are almost the same with version 1.6.6, I will try to bisect the regression

Aug 08 '22 03:08 hamidb80

Because the devel enabled threads:on by default. Closure iterators are much slower with threads:on in ARC/ORC. Use threads:off as an optimization for now.

Aug 08 '22 03:08 ringabout

I did a profile again

thread specific logic increase the time and mimalloc doesn't help this.

Aug 08 '22 03:08 ringabout

I seen a little slower performance on multiple places, even for simple programs, not so extreme like 8x but still, I wonder if the threads:on default should be reconsidered. 🤔

Oct 20 '22 17:10 juancarlospaco

Out of curiosity, what's the underlying cause?

Oct 20 '22 18:10 Varriount

I wish I knew, I suspect a terrible "thread local storage" implementation.

Oct 21 '22 08:10 Araq

It is not related to ARC/ORC, I can repro on Linux with --mm:none and --threads:on / --threads:off.

Oct 21 '22 10:10 juancarlospaco

What would be interesting is a comparison between Nim and other (compiled) languages with similar closure semantics/mechanisms. Then we might be able to uncover ways to optimized the currently generated code.

Oct 21 '22 19:10 Varriount

It's not very interesting, Nim always allocates but often enough the closures really do escape so that it cannot be optimized out. Where the closures don't escape idiomatic Nim already uses templates.

It's the thread local storage emulation or something similiar.

Oct 21 '22 19:10 Araq

Hm, how slow are Nim's closures compared to, say, Go?

Oct 21 '22 22:10 Varriount

With --threads:off my timings are: 0.571 ms for ARC and 0.569 ms for refc. Close enough.

Mar 28 '23 13:03 Araq

So using --threads:off is recommended?. 🤔

Mar 28 '23 13:03 juancarlospaco

If you don't use threads, yes.

Mar 28 '23 15:03 Araq

Then maybe --threads:on should be reconsidered as default. 🤔

Mar 28 '23 15:03 juancarlospaco

No, people should exploit threads in order to get performance instead. shrug

Mar 28 '23 16:03 Araq

Nim Nim copied to clipboard

closure iterators are much slower with ARC/ORC

What happened?

Nim Version

Current Standard Output Logs

Expected Standard Output Logs

Additional Information

Nim
Nim copied to clipboard