cdotu_ returning 0s and segfault
My C++ code:
#include <iostream>
#include <complex>
#include <cblas.h>
extern "C" void cdotu_(std::complex<float> *res, int *n,
const std::complex<float> *x, int *incx,
const std::complex<float> *y, int *incy);
int main() {
int n = 3;
int incx = 1;
int incy = 1;
std::complex<float> x[] = { {1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f} };
std::complex<float> y[] = { {7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f} };
std::complex<float> result;
for (int i = 0; i < n; ++i){
std::cout << "x: " << x[i] << ", y: " << y[i] << "\n";
}
cdotu_(&result, &n, x, &incx, y, &incy);
std::cout << "Result of cdotu_: " << result << '\n';
return 0;
}
Result:
x: (1,2), y: (7,8)
x: (3,4), y: (9,10)
x: (5,6), y: (11,12)
Result of cdotu_: (0,0) # The error
The correct answer should be: (-39,214) which works when I call cblas_cdotu_sub instead of cdotu_.
One other interesting thing. If I change:
- std::complex<float> result;
+ std::complex<float> result = {1.0f, 1.0f};
I get a segfault:
x: (1,2), y: (7,8)
x: (3,4), y: (9,10)
x: (5,6), y: (11,12)
Thread 1 "cdotu_repro" received signal SIGSEGV, Segmentation fault.
0x000073df245f5c40 in cdotu_k_SKYLAKEX () from /lib/libopenblas.so.0
I made a similar simple program using sdot_, and that worked as intended.
Compile Command:
g++ -o cdotu_repro cdotu_repro.cpp -lopenblas
Enviorment information:
OS: Ubuntu 24.04.2 LTS (x86_64)
G++ version: 13.3.0
OpenBLAS version: 0.3.30 (release)
CPU: Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz
This is a Cascade Lake CPU, which is not listed directly as a supported CPU. Is this why the test does not work? But then why does the cblas_cdotu_sub work and not cdotu_? Or am I calling it incorrectly?
I found this issue when building PyTorch with OpenBLAS. The repro above shows how PyTorch calls cdotu_.
that's probably an argument error in your call, I'll have a look at it later. Generally the cblas_ function interfaces are specifically intended for calling BLAS from C/C++ while the non-prefixed ones expect Fortran-like argument conventions (because the original reference implementation is written in Fortran)
Cascade Lake not being listed as a unique build target is not a problem, given that it is mostly a refresh of Sky Lake with no significant instruction set changes relevant to OpenBLAS
See https://github.com/pytorch/pytorch/pull/143846