AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

AMD's graph optimization engine.

Results 433 AMDMIGraphX issues
Sort by recently updated
recently updated
newest added

This issue has two parts. The first part is to fuse reductions(including split reductions) with MLIR, including any pointwise. The second part is to use multiple outputs when fusing, to...

Perf Improve

Update our [Dockerfile ](https://github.com/ROCm/AMDMIGraphX/blob/develop/Dockerfile) and [hip-clang.docker](https://github.com/ROCm/AMDMIGraphX/blob/develop/hip-clang.docker) Additional files may also be needed https://github.com/ROCm/AMDMIGraphX/tree/develop/tools/docker

Continous Integration

Add weight streaming to allow running of large models on GPUs with low memory. Closes #3156.

enhancement
Windows
Ubuntu
UAI

``` @404 = gpu::code_object[code_object=6464,symbol_name=mlir_convolution_add,global=102400,local=256,](@402,@293,@400,@403) -> half_type, {1, 255, 80, 80}, {1632000, 6400, 80, 1}, target_id=0: 0.0226192ms, 1% @405 = reshape_lazy[dims={1, 3, 85, 80, 80}](@404) -> half_type, {1, 3, 85, 80,...

Perf Improve

Figure out a way to have weight streaming at runtime i.e. be able to fit large models on gpu without needing to know literal size ahead of time - [x]...

enhancement
Windows
Ubuntu
UAI
Under Investigation

There appears to be an occasional issue in which we try to allocate a buffer to the gpu that seems to be an overflow of an UInt64. @kahmed10 has reportedly...

bug

``` 440 = gpu::code_object[code_object=9544,symbol_name=concat_kernel,global=714000,local=1024,](@439,@436,@437,@438) -> half_type, {1, 25200, 85}, {2142000, 85, 1}, target_id=0: 0.0305542ms, 2% main:#output_0 = @param:main:#output_0 -> float_type, {1, 25200, 85}, {2142000, 85, 1}, target_id=0: 0.0007664ms, 1% @442...

Perf Improve

Due to the use of `__syncthreads` in the `reduce` methods registers are not reused. We can reuse them directly by assigning to them with `r.inner([](auto& y, auto x) { y...

Perf Improve