ohpc very slow diagonalization with openblas-ohpc in 1.3.1

Hi,

I'm a newbie using openhpc for the first time. I'm really grateful for this project since it saved a lot of my time.

But, it turns out that matrix diagonalization (dsyev) with openblas shipped in 1.3.1 is significantly slower than my own building of a single thread version with sandybridge kernel installed in my home directory.

What are the options given when building this openblas-gnu7-ohpc rpm package?

It even sometimes hangs when running my simulation code which involves a huge number of diagonalizations, so it is not practically usable for me. The speed difference is already notable in the simple test code attached below. I compiled the code with "-march=native -O2" optimization. In a few runs done in my frontend node (Xeon E5-2620v2), the test code was on average 4 times slower with openblas-gnu7-ohpc.

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cassert>

extern "C" {
	void dsyev_(const char* jobz, const char* uplo, int* n,
                double* a, int* lda,
                double* w, double* work, int* lwork, int* info);
}

void syev(const char jobz, const char uplo, int n, double* a, double* w)
{ 
        int lwork, info;
        double tmp;
        lwork=-1;
        dsyev_(&jobz,&uplo,&n,a,&n,w,&tmp,&lwork,&info);
        assert(info==0);
        lwork=(int)(tmp+0.1);
        double* work=new double[lwork];
        dsyev_(&jobz,&uplo,&n,a,&n,w,work,&lwork,&info);
        delete[] work;
        assert(info==0);
}

int main() {

	srand48(time(NULL));

	for (int k=0;k<100;++k) {

	const int N=200+(int)(200.0*drand48());

	double* mat=new double[N*N];
	double* eig=new double[N];

	for (int j=0;j<N;++j) { 
		for (int i=j;i<N;++i) {
			mat[i+j*N]=drand48();
			mat[j+i*N]=mat[i+j*N];
		}
		mat[j+j*N]+=2.0;
	}
	syev('V','U',N,mat,eig);

	std::cout << N << "\t" << eig[0] << std::endl;

	delete[] eig;
	delete[] mat;

	}
	
}

Aug 22 '17 08:08 qdyn

Hi @dhkim1231 - thanks for the report. We don't have anything conclusive yet but we tried your code on a few different builds and got the following timings (on haswell):

Version	Build	Flags	Wallclock Time
0.2.19	Source	-march=native -O2	4.873s
0.2.19	OpenHPC 1.3.1	TARGET_ARCH=dynamic -O2	19.814s
0.2.19	SLE 12	TARGET_ARCH=dynamic -O2	21.738s
0.2.20	Source	-march=native -O2	4.818s
0.2.20	OpenHPC 1.3.2	TARGET_ARCH=dynamic -O2	5.151s

Are these more or less in line with what you are seeing?

Aug 22 '17 21:08 crbaird

Yes, the ratio is more or less what I saw with ohpc 1.3.1 on my sandybridge. It's interesting to see that there's a dramatic difference between 1.3.1 and 1.3.2.

Aug 23 '17 01:08 qdyn

A friendly reminder that this issue had no activity for 30 days.

Aug 07 '24 00:08 github-actions[bot]