Chris Elrod

Results 837 comments of Chris Elrod

> In these examples, julia is running with 28 threads on a Xeon with 28 cores. The overhead goes down if I reduce the number of julia threads, but is...

We could add a minimum batch size argument. LoopVectorization won't support dual numbers until the rewrite unfortunately, and the rewrite won't support threading for some time. But it does try...

> So is the issue just that threading always has this overhead and I haven't noticed before because I've been using it on more computationally heavy workloads? Notice that in...

> Is there an option for Polyester here? `@batch minbatch=` is the easiest way when using Polyester directly. `@batch` sets up a call to `batch`. But `FastBroadcast` calls `batch` directly....

perhaps `@🧵 x = foo(a, bar(c,d,e), f, g)`

This is an, uhh, interesting choice ```julia julia> params(Float32) 76 julia> params(Float64) 76 ```

What's the problem? Your `fast_foo9` is over 2x faster. EDIT: oh, even when broadcasting `b`. Huh.

FWIW, I got ```julia julia> using FastBroadcast julia> function fast_foo9(a, b, c, d, e, f, g, h, i) @.. a = b + 0.1 * (0.2c + 0.3d + 0.4e...

Comparing 30k evaluations, where `b` is fullsize and `bs` is the small version: ```julia julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads)" begin foreachf(fast_foo9, 30_000, a, bs, c, d, e, f, g, h, i) end...

> Should we fix this inaccuracy by inserting a sleep call in the dynamic broadcasting branch? Probably better to update the README instead, as the README claims FastBroadcast is slower...