Ben Vanik
Ben Vanik
Ahhh I think maybe the disconnect is that us submoduling nccl makes me think we're pinned to that nccl version we're building ourselves because that's what you'd generally use a...
We only need the header for a handful of tiny enums - we should just replicate those and then we don't even need the header file. We can't use the...
Yeah agreed, what we're talking about here is taken to its extreme effectively something like: ``` iree_status_t iree_hal_nccl_result_to_status(int result) { switch (result) { case /*ncclSuccess=*/0: return iree_ok_status(); case /*ncclUnhandledCudaError=*/1: return...
This effectively no-op change breaks everything and I have no idea why.... oh~ cmake~
yeah the location now is just library name + ordinal, but we can add an optional src_locs table just like we have names/tags here: https://github.com/google/iree/blob/3bcf38ae322e0862d46639b8f6d3580c8d40ac2a/iree/hal/local/executable_library.h#L300 then on the compiler side...
(oh tracy does capture and embed source files, so the traces would still be hermetic even with a source listing, it's really just hermetic vmfb vs vmfb+listing file on the...
summary tag will be very useful when skimming/looking at the summary view and trying to see what's taking time stream or untranslated HAL executables would be a good point for...
ah - I'd just name the whole dispatch (`main_dispatch_23_matmul_MxNxK`)
yeah - naming the dispatches in 1 would help all tooling (tracy, perf, our own stuff whenever we have it) as it's the easiest to get universally (it'll end up...
I don't think it's directly transferrable but as a reference I had to solve some similar issues for stream partitioning with respect to obeying SSA use-def dominance, cloning into consumers,...