Nim icon indicating copy to clipboard operation
Nim copied to clipboard

closure iterators are much slower with ARC/ORC

Open hamidb80 opened this issue 1 year ago • 2 comments

What happened?

here I get 8x slow down when I switch to --mm:arc.

import std/sugar
import benchy

proc toIter*[T](s: Slice[T]): iterator: T =
  iterator it: T {.closure.} =
    for x in s.a..s.b:
      yield x
  return it

proc filter*[T](i: iterator: T, f: proc(x: T): bool): iterator: T =
  iterator it: T {.closure.} =
    for x in i():
      if f(x):
        yield x
  result = it

iterator filter*[T](i: iterator: T, f: proc(x: T): bool): T =
  for x in i():
    if f(x):
      yield x


timeIt "closure iterator":
  var acc = 0

  for i in (1..100_000).
    toIter.
    filter(x => x mod 2 == 0).
    filter(x => x mod 4 == 0).
    filter(x => x mod 8 == 0).
    filter(x => x mod 16 == 0).
    filter(x => x mod 32 == 0).
    filter(x => x mod 64 == 0).
    filter(x => x mod 128 == 0).
    filter(x => x mod 256 == 0).
    filter(x => x mod 512 == 0):

    acc.inc i

Nim Version

Nim Compiler Version 1.7.1 [Windows: amd64]
Compiled at 2022-07-17
Copyright (c) 2006-2022 by Andreas Rumpf

active boot switches: -d:release

Current Standard Output Logs

with ARC: nim --mm:arc -d:release r .\play.nim:

min time    avg time  std dv   runs name
4.022 ms    4.352 ms  ±0.207  x1000 closure iterator

with refC: nim -d:release r .\play.nim:

min time    avg time  std dv   runs name
0.841 ms    0.924 ms  ±0.045  x1000 closure iterator

Expected Standard Output Logs

almost the same numbers

Additional Information

the numbers are almost the same with version 1.6.6, I will try to bisect the regression

hamidb80 avatar Aug 08 '22 03:08 hamidb80

Because the devel enabled threads:on by default. Closure iterators are much slower with threads:on in ARC/ORC. Use threads:off as an optimization for now.

ringabout avatar Aug 08 '22 03:08 ringabout

I did a profile again

image

thread specific logic increase the time and mimalloc doesn't help this.

ringabout avatar Aug 08 '22 03:08 ringabout

I seen a little slower performance on multiple places, even for simple programs, not so extreme like 8x but still, I wonder if the threads:on default should be reconsidered. 🤔

juancarlospaco avatar Oct 20 '22 17:10 juancarlospaco

Out of curiosity, what's the underlying cause?

Varriount avatar Oct 20 '22 18:10 Varriount

I wish I knew, I suspect a terrible "thread local storage" implementation.

Araq avatar Oct 21 '22 08:10 Araq

It is not related to ARC/ORC, I can repro on Linux with --mm:none and --threads:on / --threads:off.

juancarlospaco avatar Oct 21 '22 10:10 juancarlospaco

What would be interesting is a comparison between Nim and other (compiled) languages with similar closure semantics/mechanisms. Then we might be able to uncover ways to optimized the currently generated code.

Varriount avatar Oct 21 '22 19:10 Varriount

It's not very interesting, Nim always allocates but often enough the closures really do escape so that it cannot be optimized out. Where the closures don't escape idiomatic Nim already uses templates.

It's the thread local storage emulation or something similiar.

Araq avatar Oct 21 '22 19:10 Araq

Hm, how slow are Nim's closures compared to, say, Go?

Varriount avatar Oct 21 '22 22:10 Varriount

With --threads:off my timings are: 0.571 ms for ARC and 0.569 ms for refc. Close enough.

Araq avatar Mar 28 '23 13:03 Araq

So using --threads:off is recommended?. 🤔

juancarlospaco avatar Mar 28 '23 13:03 juancarlospaco

If you don't use threads, yes.

Araq avatar Mar 28 '23 15:03 Araq

Then maybe --threads:on should be reconsidered as default. 🤔

juancarlospaco avatar Mar 28 '23 15:03 juancarlospaco

No, people should exploit threads in order to get performance instead. shrug

Araq avatar Mar 28 '23 16:03 Araq