Jack Mott

Results 91 comments of Jack Mott

Ok I seem to have convinced myself that the random values vs constant values different wasn't really happening anyway.

@cloudRoutine am I interpreting that BlitParser stuff correctly? They are outputting IL by hand there? That is rad.

So it looks like removing the inline call to `getLeftovers` and just, actually making it inline is a big win. I will experiment with that more to be sure though.

So the primary loop for map is : ``` f# while i (array,i ))).CopyTo(result,i) i , valuetype [System.Numerics.Vectors]System.Numerics.Vector`1>::Invoke(!0) IL_0057: stloc.s v IL_0059: ldloca.s v IL_005b: ldloc.1 IL_005c: ldloc.s i IL_005e:...

Possible to roll our own copyto without the checks? https://github.com/dotnet/corefx/blob/future/src/System.Numerics.Vectors/src/System/Numerics/Vector.cs#L777 Doesn't seem like JitIntrinsic is an attribute we get to use :( It is possible the JitIntrinsic magic elides all...

Oh wow, using a tail recursive main loop instead of a while loop is consistently faster. This test was done with an array of doubles (half as wide as int32s)...

So this trick doesn't work with sum, map2, or map3 unfortunately. Works great with mapInPlace though! ``` ini Host Process Environment Information: BenchmarkDotNet=v0.9.8.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-4712HQ...

Yeah so I got some insight into what was going on from r/fsharp. I was using the loop counter variable in a lambda below, when dealing with the leftover bits...

Results with the loop stuff all sorted out. Big improvements for small arrays which is nice. ``` ini Host Process Environment Information: BenchmarkDotNet=v0.9.8.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-4712HQ...

I think short of exotic hacks things are as fast as they are going to get now. If anyone has exotic hacks feel free to chime in.