AMDGPU.jl issues

Update ROCBLASFloat

2

@jpsamaroo I also improved the dispatch for complex `ROCArray`.

amontoison

Update `ROCQueue` doc

Update doc section referring to queue handling to reflect latest syntax.

luraess

documentation

Mem.alloc: Allow using hipMalloc to service allocations

Some libraries, like rocSPARSE, call HIP functions which expect to be passed allocations generated from `hipMalloc` and friends. Because `hipMalloc` just ends up calling HSA allocation functions, we should be...

jpsamaroo

bug

enhancement

hip

Add rotate! and reflect! functions

1

amontoison

Add the axpby! function

amontoison

Ensure killqueue!() updated global QUEUES list

3

Previously, a call to `killqueue()` would not clean up the global `QUEUES` list. On my device, a new call to `Queue()` would reuse the same queue pointer, and this would...

torrance

Consolidate thread lock/data race handling

1

I've noticed a number of competing techniques used to manage data races throughout the codebase. There is the RT_LOCK (I assume RT = runtime) used to manage global state access...

torrance

bug

documentation

enhancement

performance

HSA memory region query test fail

3

This test currently fails (Julia 1.8, MI250x) https://github.com/JuliaGPU/AMDGPU.jl/blob/e0a48dd9aadc0329e176a983ff0d7ee0e824b252/test/hsa/memory.jl#L184 with following error ``` Region API Queries: Test Failed at /pfs/lustrep4/scratch/project_465000139/lurass/AMDGPU.jl/test/hsa/memory.jl:184 Expression: all(Runtime.region_host_accessible, regions_finegrained) Stacktrace: [1] macro expansion @ /pfs/lustrep4/scratch/project_465000139/lurass/julia_local/julia-1.8.0/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined] [2]...

luraess

bug

tests

hsa

libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/rocm/lib/librocfft.so)

Hello, I just got a new AMD RX 6700 XT and have been testing out some AMDGPU features with it. when using it for the first time, I got the...

leios

bug

upstream

Implement occupancy API

2

CUDA has a great feature for sizing threads and blocks, namely launch_configuration(). I rarely manually size my kernel, instead something like: ``` kernel = @cuda launch=false myfunc(args...) config = launch_configuration(kernel.fun)...

torrance

AMDGPU.jl
AMDGPU.jl copied to clipboard

Metadata

Update ROCBLASFloat

Update `ROCQueue` doc

Mem.alloc: Allow using hipMalloc to service allocations

Add rotate! and reflect! functions

Add the axpby! function

Ensure killqueue!() updated global QUEUES list

Consolidate thread lock/data race handling

HSA memory region query test fail

libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/rocm/lib/librocfft.so)

Implement occupancy API

← Metadata

Owner

Metadata

AMDGPU.jl AMDGPU.jl copied to clipboard

Metadata

← Metadata

Owner

Metadata

AMDGPU.jl
AMDGPU.jl copied to clipboard