accelerate-llvm
accelerate-llvm copied to clipboard
Library won't utilize all gpu cores. [BUG]
Description When I run a function on the gpu that should take up all available cores it doesn't and instead uses roughly 10% of them. I'm not sure if this is a misconfiguration, or a bug.
Steps to reproduce
Compile the following code:
module Main where
import Data.Array.Accelerate as A
import Data.Array.Accelerate.LLVM.PTX as PTX
import Prelude as P
func :: Exp Double -> Exp Double
func x = do
let one = constant 1.0 :: Exp Double
one / (one + (exp (-x)))
func2 :: Acc (Vector Int) -> Acc (Vector Double) -> Acc (Vector (Int, Double))
func2 a b = A.zip a b
func3 :: Acc (Vector (Int, Double)) -> Acc (Vector (Int, Double))
func3 x = do
let (a, b) = A.unzip x
b2 = A.map (\y -> func y) b
a2 = A.map (\z -> z - (constant 1 :: Exp Int)) a
A.zip a2 b2
test :: Acc (Vector (Int, Double)) -> Acc (Scalar Bool)
test x = do
let (a, _) = A.unzip x
A.unit ((constant 1) A./= (a A.!! (constant 0)))
numCudaCores :: Int
numCudaCores = 2048 -- I have an NVidia GeForce RTX 3050 Laptop GPU, I'm pretty sure it has 2048 cores.
main :: IO ()
main = do
let arr = A.fill (constant (Z:.numCudaCores)) 1.0 :: Acc (Vector Double)
arr2 = A.fill (constant (Z:.numCudaCores)) 10000000 :: Acc (Vector Int)
func4 = A.awhile ( test ) func3 (func2 arr2 arr)
result = PTX.run func4
putStrLn (show result)
and then run nvtop while it's running on a linux machine and it should show that the gpu is only being utilized around 8%. Expected behaviour The code should be using all CUDA cores.
Your environment OS: Arch Linux (latest version as of january 2nd 2023). GPU: NVidia GeForce RTX 3050 Laptop GPU GHC: 9.0.2
accelerate-llvm-ptx commit: 6f3b6ca2c693d84c43b62b066c70e14a9e73ce63 accelerate commit: 5971c5d8e4dbba28d2017e7ce422cf46a20197cb
I can't post the results of nvidia-device-query because that program isn't on my computer and I can't find any information at all about what package to install on Arch to get it.
Additional context
You may need to use a larger input to use all GPU cores. Can you try that?
Currently I'm not able to get the llvm backends to compile, so no.
Edit: I got it to compile, and using larger inputs doesn't do anything to use more cores.
Edit 2: The native backend doesn't use all CPU cores either.