neko
neko copied to clipboard
Add generic vector/matrix operations
Overloading some operators for the vector_t
and matrix_t
classes, and add some additional math routines like cadd2
, device_cadd2
(a(i) = b(i) + c) and device_add3
(a(i) = b(i) + c(i)).
This is to facilitate the usage of vector_t
and matrix_t
especially with GPUs. This should not break anything since it's just adding to what we have, only thing I changed is the intent on some arguments in sub3
and add3
, @njansson will this be a problem?
Tested on cpus, nvidia, and amd gpus. Not tested with OpenCL.
Contrary to the assignment operator where if we do v = w
and v%x
is already allocated, we free v
and re-initialize it to have the same size as w
, all the other operations assume that if one does v = a + b
and v
is already allocated then v
should have the same size as b
and a
. So there is no implicit reallocation except for the assignment operator.