PySPOD
PySPOD copied to clipboard
Weights shape error
Hello,
when I run in parallel the SPOD with a custom weighting matrix (area of the elements) I get the following error but everything is fine when I run in serial mode. Do you have any idea on that?
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data
What's the shape of your weights? Can you try passing weithgs.reshape(-1)
instead?
Thank you now I'm able to run with 2-3 ranks but I get the same error when I scale this up. For reference my weights shape is:
(1394730,)
and my time series data shape is
(200, 278946, 5)
The error arise from this piece of code
def distribute_dimension(data, max_axis, comm):
"""
Distribute desired spatial dimension, splitting partitions
by value // comm.size, with remainder = value % comm.size
"""
## distribute largest spatial dimension based on data
if comm is not None:
size = comm.size
rank = comm.rank
shape = data.shape
index = [np.s_[:]] * len(shape)
N = shape[max_axis]
n, s = _blockdist(N, size, rank)
index[max_axis] = np.s_[s:s+n]
index = tuple(index)
data = data[index]
comm.Barrier()
else:
data = data
return data
Best
and my time series data shape is
(200, 278946, 5)
So do you have 200 time samples, each comprising of 278946 spatial points with 5 variables per point?
I think the weights correspond to just to spatial points and not variables, therefore you should provide 278946 weights, and not 278946 * 5 = 1394730. @mrogowski Can you confirm?
We should support weight per spatial point per variable. Looking quickly at the code, I think we may have a bug. We tested the one variable branch heavily in parallel, but not so much for data with multiple variables. @FrankFrank9, what is the format of your data? Could you come up with a simple reproducer?
Unfortunately I can't make an easy reproducible thing. I guess anything with those shapes should work. It is an error in redistributing data. Let me know
Can you try to run with this change in PySPOD?
I get the same error:
ValueError: cannot reshape array of size 139473 into shape (139475,1)
During handling of the above exception, another exception occurred:
Unfortunately I can't make an easy reproducible thing.
Not even using random data with shapes that match your data?
I generated random data:
data matrix X (200, 278946, 5)
weights (278946, 1, 5)
and tried with 7, 8, 9, 10, 11, 12 processes. All seem to have worked. Any reproducer would be very helpful to assist you.
I generated random data:
data matrix X (200, 278946, 5) weights (278946, 1, 5)
and tried with 7, 8, 9, 10, 11, 12 processes. All seem to have worked. Any reproducer would be very helpful to assist you.
Now it works, the weights need the second axis as well , mine were just (npts, nvars). Thanks for looking into this !
Oh, but then that means we can do better, that is, add the missing axis, right Marcin?
Now it works, the weights need the second axis as well , mine were just (npts, nvars). Thanks for looking into this !
Good to hear! Like I said before, most of the runs we did so far were for 1 variable 2D data, so you may spot some issues with 1D and/or multivariable data. Let us know and we'll try to fix it.
Oh, but then that means we can do better, that is, add the missing axis, right Marcin?
I'll try to reproduce the issue that @FrankFrank9 ran into and fix it. I used (278946, 1, 5)
because that's what I got from utils_weights.geo_trapz_2D
. It just happens that it was the problem.
Now it works, the weights need the second axis as well , mine were just (npts, nvars). Thanks for looking into this !
Good to hear! Like I said before, most of the runs we did so far were for 1 variable 2D data, so you may spot some issues with 1D and/or multivariable data. Let us know and we'll try to fix it.
Thanks a lot! If I find any other issue I'll post here
Best
I'll try to reproduce the issue that @FrankFrank9 ran into and fix it.
I couldn't - worked for me with (278946, 5)
weights as well.
At this point I don't know, the version I was using with the error was coming from
pip install pyspod
Is it the same version?
pip install pyspod
would install the last published version which does not contain this fix. You'd need to pip install git+https://github.com/MathEXLab/PySPOD@refs/pull/48/head
or manually clone the repo from the PR and pip install
it.