BLASX large scale Dgemm segmentation fault

Hello, there are still errors when applying the library to a large matrix gemm on multiple GPUs. I need to find another library that can replace cublasXt and execute large-scale matrix gemm on multiple GPUs. Therefore, I tested Dgemm on 50000X50000 and 50000X50000. In the end, I got a segmentation error. I tried to figure out what the problem was. But it's hard for me to go deep. I just found that cublasGetMatrixAsync and cublasSetMatrixAsync seem to be wrong in blasx_dgemm.c. Here is the gdb info about this. 1575817709021 If you can help, I really appreciate it.

Dec 08 '19 15:12 yuyangxie96

can you try a smaller case, e.g. 20000 x 20000?

Dec 08 '19 17:12 linnanwang

Would you please tell me your environment? Which cuda version and how many GPUs are you using?

Dec 08 '19 19:12 eddy16112

Yes, it can work for 20000 x 20000 double matrix. Sorry, I just found another issues about this. So it seems the 50000 x 50000 double matrix exceeds the available device memory and there is no memory check. It is a pity to use the lib to solve large-scale problems.

Dec 09 '19 02:12 yuyangxie96

okay. It is already very impressive to see 2*10^4 case working on GPU.

The purpose of this library is to demonstrate a new system design to support large scale matrix multiplications, not for production use. We're researchers to focus on ideas. Maintaining and implementing a production level code base is too laborious for us to do, therefore I recommend you using cuBLAS-XT instead. Thank you for your interest.

Dec 09 '19 02:12 linnanwang

I reproduced your bug and found out the issue of seg fault came from the overflow of int. https://github.com/linnanwang/BLASX/blob/master/blas/blasx_dgemm.c#L64 I already created a PR and will merge it into master ASAP.

The advantage of BLASX is allowing super large matrix that can not be fit in GPU memory as long as it can be fit into CPU memory, because we split matrix into tiles and have a task runtime to swap tiles into/out of GPU memory.

Dec 09 '19 07:12 eddy16112

Thanks very much!!! I take the change to the code. But unfortunately, there still a seg fault. When running large matrix(n=50000), it will cause seg fault again. It seems carsh when running mem_control_kernel_double. 1575891205947 I have four GeForce GTX 1080 Ti with 10G device memory.

Dec 09 '19 12:12 yuyangxie96

I feel like it is another overflow. I ran 50K*50K sgemm with 1 GPU and it works fine. I do not have a multi-gpu machine now. Can you try sgemm on 1 GPU?

Dec 09 '19 18:12 eddy16112

BLASX BLASX copied to clipboard

large scale Dgemm segmentation fault

BLASX
BLASX copied to clipboard