fsharp
fsharp copied to clipboard
Optimize integer for loop code gen
#938 #9548
Optimizes for i in n .. step .. m do
il code gen to avoid allocation.
Throws ArgumentException at runtime if step is zero.
Replicates logic of OperatorIntrinsics.RangeInt32 with additional optimizations if the step is a non-zero compiled constant.
Example:
let f n step m =
for i in n..step..m do
System.Console.WriteLine()
Before:
.method public static
void f (
int32 n,
int32 step,
int32 m
) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = (
01 00 03 00 00 00 01 00 00 00 01 00 00 00 01 00
00 00 00 00
)
// Method begins at RVA 0x2050
// Code size 59 (0x3b)
.maxstack 5
.locals init (
[0] class [System.Runtime]System.Collections.Generic.IEnumerable`1<int32>,
[1] class [System.Runtime]System.Collections.Generic.IEnumerator`1<int32>,
[2] int32,
[3] class [System.Runtime]System.IDisposable
)
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: call class [System.Runtime]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32)
IL_0008: stloc.0
IL_0009: ldloc.0
IL_000a: callvirt instance class [System.Runtime]System.Collections.Generic.IEnumerator`1<!0> class [System.Runtime]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
IL_000f: stloc.1
.try
{
// loop start (head: IL_0010)
IL_0010: ldloc.1
IL_0011: callvirt instance bool [System.Runtime]System.Collections.IEnumerator::MoveNext()
IL_0016: brfalse.s IL_0026
IL_0018: ldloc.1
IL_0019: callvirt instance !0 class [System.Runtime]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
IL_001e: stloc.2
IL_001f: call void [System.Console]System.Console::WriteLine()
IL_0024: br.s IL_0010
// end loop
IL_0026: leave.s IL_003a
} // end .try
finally
{
IL_0028: ldloc.1
IL_0029: isinst [System.Runtime]System.IDisposable
IL_002e: stloc.3
IL_002f: ldloc.3
IL_0030: brfalse.s IL_0039
IL_0032: ldloc.3
IL_0033: callvirt instance void [System.Runtime]System.IDisposable::Dispose()
IL_0038: endfinally
IL_0039: endfinally
} // end handler
IL_003a: ret
} // end of method _::f
public static void f(int n, int step, int m)
{
IEnumerable<int> enumerable = Operators.OperatorIntrinsics.RangeInt32(n, step, m);
IEnumerator<int> enumerator = enumerable.GetEnumerator();
try
{
while (enumerator.MoveNext())
{
int current = enumerator.Current;
Console.WriteLine();
}
}
finally
{
IDisposable disposable = enumerator as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
}
After:
.method public static
void f (
int32 n,
int32 step,
int32 m
) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = (
01 00 03 00 00 00 01 00 00 00 01 00 00 00 01 00
00 00 00 00
)
// Method begins at RVA 0x2050
// Code size 53 (0x35)
.maxstack 4
.locals init (
[0] int32,
[1] int32,
[2] int32
)
IL_0000: ldarg.0
IL_0001: stloc.2
IL_0002: ldarg.1
IL_0003: stloc.1
IL_0004: ldloc.1
IL_0005: brtrue.s IL_0017
IL_0007: ldstr "The step of a range cannot be zero."
IL_000c: ldstr "step"
IL_0011: newobj instance void [mscorlib]System.ArgumentException::.ctor(string, string)
IL_0016: throw
IL_0017: ldarg.2
IL_0018: stloc.0
IL_0019: br.s IL_0024
// loop start (head: IL_0024)
IL_001b: call void [mscorlib]System.Console::WriteLine()
IL_0020: ldloc.2
IL_0021: ldloc.1
IL_0022: add
IL_0023: stloc.2
IL_0024: ldloc.2
IL_0025: ldloc.0
IL_0026: ble.s IL_002c
IL_0028: ldloc.1
IL_0029: ldc.i4.0
IL_002a: bgt.s IL_0034
IL_002c: ldloc.2
IL_002d: ldloc.0
IL_002e: bge.s IL_001b
IL_0030: ldloc.1
IL_0031: ldc.i4.0
IL_0032: bge.s IL_001b
// end loop
IL_0034: ret
}
public static void f(int n, int step, int m)
{
int num = n;
if (step == 0)
{
throw new ArgumentException("The step of a range cannot be zero.", "step");
}
while ((num <= m || step <= 0) && (num >= m || step >= 0))
{
Console.WriteLine();
num += step;
}
}
Constant step optimization:
let f n m =
for i in n..2..m do
System.Console.WriteLine()
.method public static
void f (
int32 n,
int32 m
) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = (
01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00
)
// Method begins at RVA 0x2050
// Code size 20 (0x14)
.maxstack 4
.locals init (
[0] int32,
[1] int32
)
IL_0000: ldarg.0
IL_0001: stloc.1
IL_0002: ldarg.1
IL_0003: stloc.0
IL_0004: br.s IL_000f
// loop start (head: IL_000f)
IL_0006: call void [mscorlib]System.Console::WriteLine()
IL_000b: ldloc.1
IL_000c: ldc.i4.2
IL_000d: add
IL_000e: stloc.1
IL_000f: ldloc.1
IL_0010: ldloc.0
IL_0011: ble.s IL_0006
// end loop
IL_0013: ret
}
public static void f(int n, int m)
{
int num = n;
while (num <= m)
{
Console.WriteLine();
num += 2;
}
}
Benchmarks
Before:
Method | Start | Finish | Step | Mean | Error | StdDev | Code Size |
---|---|---|---|---|---|---|---|
Benchmark | 100 | 1000 | 10 | 24.74 ns | 0.536 ns | 1.529 ns | 465 B |
After:
Method | Start | Finish | Step | Mean | Error | StdDev | Code Size |
---|---|---|---|---|---|---|---|
Benchmark | 100 | 1000 | 10 | 4.999 ns | 0.2030 ns | 0.5985 ns | 175 B |
79.8% faster, while smaller and without allocation.
Quotation changes
Quotations and Expr structure have been changed as int32 range step for loops are no longer backed by an enumerator object, a new active pattern is added to match against stepped for loops.
Nice. Can I please ask you to add some benchmarks, so we understand what are implications in runtime here?
Comparisons - before/after, comparison with normal loop, etc.
Also:
Throws ArgumentException at runtime if step is zero.
What is current behaviour? Also, it should produce warning/error if step is constant and known at compile time.
@vzarytovskii I added benchmark results to the original pr message.
Throwing an error at runtime is a behavior replicated from the RangeInt32 enumerator type that backed the .. .. syntax. sharplab example I can look into displaying a warning/error if the step is zero however it might be better to emit such a message for any usage of an integer range step not just in a for loop.
@vzarytovskii I added benchmark results to the original pr message.
Thanks!
Throwing an error at runtime is a behavior replicated from the RangeInt32 enumerator type that backed the .. .. syntax. sharplab example I can look into displaying a warning/error if the step is zero however it might be better to emit such a message for any usage of an integer range step not just in a for loop.
Yep, agree, should be done separately for all such cases.
I've made further improvements to the codegen that results in smaller and faster il. Disassembled il, c#, and benchmarks updated.
this is ready
79.8% faster, while smaller and without allocation.
Wow! I noticed you showed timings for a loop of 90x, did you see similar behavior with larger loops? Did you compare when there is slightly more code in the body, or some reference type so that JIT cannot optimize everything away? Just curious! These timings are amazing!
@albert-du This is fantastic work
We need some more testing
-
We have to test loops near Int32.MaxValue. These can be really difficult to get right since adding the step causes the value to wrap around. It's quite possible what you have will have problems for these loops - this is tha major reason why we haven't optimized more loops previously.
-
We have to test tasks containing these loops, since you've made changes in the resumable code state machines.
-
Please make additions to
tests\walkthroughs\DebugStepping\TheBigFileOfDebugStepping.fsx
, then compile that file (add a project file to that directory if you like and add it to the solution) and test the stepping and breakpoints for these loops
@dsyme I added over/underflow checking to the constant step optimized loops, variable step loops don't seem to suffer from this as the logic was completely lifted from RangeInt32. Tests for under/overflow checking and tasks added. Debug stepping for loops looks good. Original comment updated with newer code gen for constant step loops.
benchmarks updated
@dsyme I think I'd like to try making those changes, it'll be a good opportunity for me to understand more of the compiler
That's 400% to 1740% faster!
@albert-du please don’t let this go stale, it’s amazing work! 💯 🥇 If it gets behind ‘main’ for too long, the conflicts may become hard to solve. If you need some help with seeing this through, feel free to ping me, or anybody, on F# Slack.
@abelbraaksma I've been busy with school the past few months but I fully intend to work more on this soon. Thank you so much for your support!