gpuR icon indicating copy to clipboard operation
gpuR copied to clipboard

Non-stable work with large matrices

Open dselivanov opened this issue 9 years ago • 14 comments

Consider following example:

library(gpuR)
ORDER = 1024 * 6
set.seed(1)
A = matrix(rnorm(ORDER^2), nrow=ORDER)
B = matrix(rnorm(ORDER^2), nrow=ORDER)
object.size(A)/1e6
#301.990088 bytes

C = A %*% B
C[1:10]
# [1]   80.85419   80.67761   41.98283 -126.17838  -99.32701   55.94015
# [7]  108.84205 -150.05794  -84.27298  -80.38638

gpuA = gpuMatrix(A, type="float")
gpuB = gpuMatrix(B, type="float")

gpuC = gpuA %*% gpuB
str(gpuC)
# flt [1:6144, 1:6144] 0 0 0 0 0 ...> 

When I try to use vclMatrix it crashes with:

Abort trap: 6.

dselivanov avatar Sep 27 '16 16:09 dselivanov

Hi @dselivanov, I assume you are still working from develop here. I have run this on my local machine and it works without a problem for both gpuMatrix and vclMatrix. What GPU do you have? It is possible that it is running out of RAM?

cdeterman avatar Sep 27 '16 17:09 cdeterman

@cdeterman sorry, yes I'm working with dev branch. Here is system info:

Number of platforms: 1

  • platform: Apple: OpenCL 1.2 (Aug 10 2016 17:16:39)
    • gpu index: 0
      • Iris Pro checked all devices completed initialization gpuR 1.1.4
gpuInfo()

$deviceName [1] "Iris Pro" $deviceVendor [1] "Intel" $numberOfCores [1] 40 $maxWorkGroupSize [1] 512 $maxWorkItemDim [1] 3 $maxWorkItemSizes [1] 512 512 512 $deviceMemory [1] 1610612736 $clockFreq [1] 1200 $localMem [1] 65536 $maxAllocatableMem [1] 402653184 $available [1] "yes"

Mb it the case when GPU goes out of memory... It has 1536mb and float matrices should occupy ~ 450mb = 3*150mb.. I don't know how to check available ram for intel gpus.

dselivanov avatar Sep 27 '16 17:09 dselivanov

When gpu goes OOM it produce:

ViennaCL: FATAL ERROR: Kernel start failed for '_prod_TT'. Error in eval(substitute(expr), envir, enclos) : ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE ViennaCL could not allocate memory on the device. Most likely the device simply ran out of memory. If you think that this is a bug in ViennaCL, please report it at [email protected] and supply at least the following information:

  • Operating System
  • Which OpenCL implementation (AMD, NVIDIA, etc.)
  • ViennaCL version Many thanks in advance!

So seems that was another case.

dselivanov avatar Sep 27 '16 17:09 dselivanov

Apparently this has to do with OSX according to the thread here. However, in that thread it recommends using only the GPU which it appears you are doing. I will need to look at this some more and potentially ping the ViennaCL maintainer.

cdeterman avatar Sep 27 '16 17:09 cdeterman

@dselivanov for the sake of completeness can you provide the output of gpuA when it is a vclMatrix so I can see the other fields of the object?

cdeterman avatar Sep 27 '16 18:09 cdeterman

Do you mean this?

library(gpuR)
ORDER = 1024 * 6
set.seed(1)
A = matrix(rnorm(ORDER^2), nrow=ORDER)
gpuA = vclMatrix(A, type="float")
gpuA

An object of class "fvclMatrix" Slot "address": pointer: 0x7f812ff2ffb0 Slot ".context_index": [1] 1 Slot ".platform_index": [1] 1 Slot ".platform": [1] "Apple" Slot ".device_index": [1] 1 Slot ".device": [1] "Iris Pro"

dselivanov avatar Sep 27 '16 19:09 dselivanov

@dselivanov yes, thank you. I wanted to confirm it says it is using the Iris Pro. One last thing that occured to me that I should have asked at the start was for listContexts() as well to see all possible contexts that were initialized by gpuR. Thanks.

cdeterman avatar Sep 27 '16 19:09 cdeterman

str(listContexts()) 'data.frame': 1 obs. of 6 variables: $ context : int 1 $ platform : Factor w/ 1 level "Apple: OpenCL 1.2 (Aug 10 2016 17:16:39)": 1 $ platform_index: int 0 $ device : Factor w/ 1 level "Iris Pro": 1 $ device_index : int 0 $ device_type : Factor w/ 1 level "gpu": 1

dselivanov avatar Sep 27 '16 19:09 dselivanov

@dselivanov unfortunately this is an unknown at this point (see issue here). I will need to see about setting up a machine to experiment on a OSX system (I don't have one myself and iterating on Travis will be painfully slow). I will let you know as things progress.

cdeterman avatar Sep 27 '16 20:09 cdeterman

Ok, just let me know what is needed - will happy to assist through skype/messengers.

dselivanov avatar Sep 28 '16 05:09 dselivanov

@dselivanov it's unclear from the previous comments, does this work for smaller matrices on your machine? If so, can you experiment at which point the error is thrown and matrix size grows?

cdeterman avatar Sep 28 '16 13:09 cdeterman

1024_5 works, 1024_6 doesn't work.

28 сент. 2016 г. 5:56 ПП пользователь "Charles Determan" < [email protected]> написал:

@dselivanov https://github.com/dselivanov it's unclear from the previous comments, does this work for smaller matrices on your machine? If so, can you experiment at which point the error is thrown and matrix size grows?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cdeterman/gpuR/issues/40#issuecomment-250173666, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3fFgWZTQA2Fw8wHTi_KfcpAZtwonks5qunIRgaJpZM4KH6TV .

dselivanov avatar Sep 28 '16 14:09 dselivanov

@dselivanov unfortunately I still cannot reproduce the problem. I started a debug session on Travis and used ssh to access an OSX build. It worked without a problem on matrix with order > 1024*6. That said, those machines don't have a GPU so I suspect something is wrong outside of the gpuR code that I cannot control. I can leave this issue open but I think I will remove it from the 1.2.0 release because as I said, it appears to be independent of my code.

cdeterman avatar Nov 16 '16 14:11 cdeterman

Thanks for digging.

16 нояб. 2016 г. 6:16 ПП пользователь "Charles Determan" < [email protected]> написал:

@dselivanov https://github.com/dselivanov unfortunately I still cannot reproduce the problem. I started a debug session on Travis and used ssh to access an OSX build. It worked without a problem on matrix with order > 1024*6. That said, those machines don't have a GPU so I suspect something is wrong outside of the gpuR code that I cannot control. I can leave this issue open but I think I will remove it from the 1.2.0 release because as I said, it appears to be independent of my code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cdeterman/gpuR/issues/40#issuecomment-260957346, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3Xh5Q3MQh6rQwPjoy8JFwvd-1Ydbks5q-xBGgaJpZM4KH6TV .

dselivanov avatar Nov 16 '16 15:11 dselivanov