accelerate-llvm icon indicating copy to clipboard operation
accelerate-llvm copied to clipboard

Library won't utilize all gpu cores. [BUG]

Open noahmartinwilliams opened this issue 2 years ago • 2 comments

Description When I run a function on the gpu that should take up all available cores it doesn't and instead uses roughly 10% of them. I'm not sure if this is a misconfiguration, or a bug.

Steps to reproduce

Compile the following code:

module Main where

import Data.Array.Accelerate as A
import Data.Array.Accelerate.LLVM.PTX as PTX
import Prelude as P

func :: Exp Double -> Exp Double
func x = do
    let one = constant 1.0 :: Exp Double
    one / (one + (exp (-x)))

func2 :: Acc (Vector Int) -> Acc (Vector Double) -> Acc (Vector (Int, Double))
func2 a b = A.zip a b

func3 :: Acc (Vector (Int, Double)) -> Acc (Vector (Int, Double))
func3 x = do
    let (a, b) = A.unzip x
        b2 = A.map (\y -> func y) b
        a2 = A.map (\z -> z - (constant 1 :: Exp Int)) a
    A.zip a2 b2

test :: Acc (Vector (Int, Double)) -> Acc (Scalar Bool)
test x = do
    let (a, _) = A.unzip x
    A.unit ((constant 1) A./= (a A.!! (constant 0)))

numCudaCores :: Int
numCudaCores = 2048 -- I have an NVidia GeForce RTX 3050 Laptop GPU, I'm pretty sure it has 2048 cores.

main :: IO ()
main = do
    let arr = A.fill (constant (Z:.numCudaCores)) 1.0 :: Acc (Vector Double)  
        arr2 = A.fill (constant (Z:.numCudaCores)) 10000000 :: Acc (Vector Int)
        func4 = A.awhile ( test ) func3 (func2 arr2 arr)
        result = PTX.run func4
    putStrLn (show result)

and then run nvtop while it's running on a linux machine and it should show that the gpu is only being utilized around 8%. Expected behaviour The code should be using all CUDA cores.

Your environment OS: Arch Linux (latest version as of january 2nd 2023). GPU: NVidia GeForce RTX 3050 Laptop GPU GHC: 9.0.2

accelerate-llvm-ptx commit: 6f3b6ca2c693d84c43b62b066c70e14a9e73ce63 accelerate commit: 5971c5d8e4dbba28d2017e7ce422cf46a20197cb

I can't post the results of nvidia-device-query because that program isn't on my computer and I can't find any information at all about what package to install on Arch to get it.

Additional context

noahmartinwilliams avatar Jan 02 '23 22:01 noahmartinwilliams

You may need to use a larger input to use all GPU cores. Can you try that?

ivogabe avatar Apr 12 '23 20:04 ivogabe

Currently I'm not able to get the llvm backends to compile, so no.

Edit: I got it to compile, and using larger inputs doesn't do anything to use more cores.

Edit 2: The native backend doesn't use all CPU cores either.

noahmartinwilliams avatar Apr 17 '23 18:04 noahmartinwilliams