Paul Fultz II
Paul Fultz II
Also, another reference we used when we were discussing the flash decoding: https://arxiv.org/pdf/2402.05099
Since MSVC implements [P0533R9](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0533r9.pdf) this breaks when compiling with hip. I opened issue [here](https://github.com/ROCm/llvm-project/issues/285) for the compiler team to fix this. For now, we need use c++17 on windows, but...
Let me try c++20.
You need to run `make generate` to update the src/api/api.cpp and src/api/include/migraphx/migraphx.h. It looks like you manually updated it.
It seems this isnt fixing the issue just bypassing it. The multi-output fusion shouldn't crash regardless of what was run before it.
> Logging with levels is a good idea. I can work on Logging with different levels in a separate work item. But i think for now, we should keep warnings...
Thinkng about this more, we can reuse our memory_coloring pass and then just do some post processing. So first we would lower the pointwise operators to an inner_pointwise that takes...
@turneram You still need to remove the GQA gpu kernel. Are you able to remove that in a followup PR?
> Would it be possible somehow to fuse pointwises across transposes ? Yes, we need to extend the `rewrite_reshapes` to handle that by updating the axis map with the new...
So the stable sorting is 2.5x slower on some config. We can recover some perf by lowering the split threshold. Most of the perf cost comes from the wavefront sorting...