Ben Vanik
Ben Vanik
this doesn't belong in flow either - global opt maybe
Question as to whether this is an artifact of the problem or of our tiling - if it's an issue with our tiling choices that feels like the thing to...
No, that's just within the dispatches - it does not create new dispatches or global tensors.
For my edification, why is exp so slow? Are we using libm and scalarizing the code? Global barriers and the extra memory traffic/cache misses feels like it'd need quite a...
> that's going to be resident in L2 is that true? I thought L2 cache was core local - when we distribute to 32 cores we're almost never going to...
(sorry, trying to understand - thanks for explaining :) In the case of a softmax being too small to distribute to multiple threads I don't think we'd care about exp...
Not sure what the latest numbers are, just some ancient ones from 2011 (!), and `V_EXP_F32` on GCN was 16 cycles. I suspect it's better now. So, in an alternate...
That's a really nice summary and a great way to reason about these kind of things! I feel like that'd probably be a good template for other times we end...
(closing as stale)
ahh think I found it: https://github.com/iree-org/iree/blob/9fe159d99d86f3292ae901427a159fb61898fa2c/runtime/src/iree/hal/allocator_heap.c#L269-L273 that's not quite right as here it takes the ALL which includes DISCARD and uses that to map... which then discards the existing contents...