scikit-cuda icon indicating copy to clipboard operation
scikit-cuda copied to clipboard

dot computation fails when stride contains equal values

Open pjotrp opened this issue 9 years ago • 6 comments

https://github.com/lebedov/scikits.cuda/blob/master/scikits/cuda/linalg.py#L464 and https://github.com/lebedov/scikits.cuda/blob/master/scikits/cuda/linalg.py#L466 can fail when one matrix has two equal strides (a valid stride of (8,8) leads to either a_f_order=False or b_f_order=False which may not match the partner value).

As a test I disabled the checks and computation passed fine validated against numpy.dot.

pjotrp avatar May 15 '15 21:05 pjotrp

@untom, can you please fix this?

lebedov avatar May 17 '15 02:05 lebedov

I may put in a fix this week.

pjotrp avatar May 17 '15 02:05 pjotrp

I think the problem is as follows: scikits.cuda tries to decide on the matrix C or FORTRAN orientation from the stride. When the stride is equal there is no way to 'guess' that orientation correctly. Arguably, raising an exception is the right thing to do, though the current errors are a bit misleading.

Why are we not using numpy's build-in orientation flags? Something to do with earlier versions of numpy? Using the flags would be unambiguous.

pjotrp avatar May 17 '15 12:05 pjotrp

There are no numpy orientation flags. The objects are pycuda.gpuarray, not numpy.ndarray. ``pycuda.gpuarray.flagsonly as flags forf_contiguousandc_contiguous`, and both are false for a strided array (since it isn't contiguous anymore). So there is no reliable way to determine if the array used to be f-order or c-order that I am aware of. I didn't consider the case of equal strides when I first wrote the code. No idea what the correct thing to do in this situation might be.

untom avatar May 17 '15 13:05 untom

One option would be the following: first, look if either f_contiguous or 'c_contiguous` are True. Only use the strides to determine orientation when both are false. Does that solve the testcases you have?

untom avatar May 17 '15 13:05 untom

When the strides are equal it is always a matrix with column width 1 (i.e., a vector in matrix form). I worked around it by using numpy.dot for these specific cases (no reason to send that to GPU anyway). You could do the same.

Note that numpy does have flags for f_contiguous and c_contiguous. See http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.ndarray.flags.html. Intriguingly, gpu_array flags are out of sync when converted from numpy. I am looking into that now. Will raise a separate issue.

pjotrp avatar May 18 '15 16:05 pjotrp