pytorch-psetae
pytorch-psetae copied to clipboard
How to normalize data if I have dataset not for everyday?
Can you please have a look at the last two comments on this issue https://github.com/VSainteuf/pytorch-psetae/issues/12?
I have a dataset with shapes:
(142, 8, 1048576) (159, 8, 1048576) (151, 8, 1048576)
How should I normalize such a dataset?
hi @manapshymyr-OB , If your dataset has time series of varying lengths, you can normalise with statistics of shape C. So you compute the mean and standard deviation of each channel, on all dates and pixels. Cheers
@VSainteuf in that case I will have an array with shape 1xC, right?
@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std. Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?
@VSainteuf in that case I will have an array with shape 1xC, right?
Yes
@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std. Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?
Yes, but you are not processing the complete image at once right ? You are supposed to crop the image with the polygons of each agricultural parcel in your AOI. Then your dataset will have shape NxTxCxS with N the number of parcels, and S the varying number of pixels in each parcel.
@VSainteuf in that case I will have an array with shape 1xC, right?
Yes
@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std. Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?
Yes, but you are not processing the complete image at once right ? You are supposed to crop the image with the polygons of each agricultural parcel in your AOI. Then your dataset will have shape NxTxCxS with N the number of parcels, and S the varying number of pixels in each parcel.
I have the Planet dataset, which is already cropped by geometry
OK I'm not sure what the question is anymore, let me know if you need clarification on one point!
I am still confused regarding the normalization shapes. Now I have samples with different sizes in terms of the temporal, while the same for channel and pixel size (they are the same because I translated them into 1024X1024), so >> Tx10x1048576. I am trying to normalize channel-wise, and not sure about the shape of the mean array. I am concatenating all npy-s into one so it will be (Tx10x1048576) and calculate the mean for each channel. Would it be the size of (10, )? I
yes if you have time series of varying length the best option is to compute the channelwise mean across all samples and dates. So you end up with mean and std of shape (C,)
yes if you have time series of varying length the best option is to compute the channelwise mean across all samples and dates. So you end up with mean and std of shape (C,)
I am getting this error:
stack expects each tensor to be equal size, but got [34, 10, 64] at entry 0 and [46, 10, 64] at entry 1. Is the reason for this error that the dataset does not have the same temporal resolution? 1.npy may have 39 X 10 X 1048576, while the 2.npy 50X 10 X 1048576. Is this fine? If not how should I process these? Should I try https://github.com/VSainteuf/utae-paps this one?
I am encountering this error during the validation step (I would not ask this if I got this during the training too...).
Error:
File "/home/adminko/PycharmProjects/pytorch-psetae/models/pse.py", line 134, in masked_mean out = out * mask RuntimeError: The size of tensor a (29) must match the size of tensor b (10) at non-singleton dimension 1
So then I printed shapes and got this during the training:
Out shape torch.Size([64, 87, 64])
Mask shape torch.Size([87, 64])
Out shape torch.Size([64, 87, 64])
Mask shape torch.Size([87, 64])
and during the validation:
Validation . . .
Out shape torch.Size([64, 29, 64])
Mask shape torch.Size([29, 10, 64])
What can be the reason?
So my sample size is 29x10x1048576 and mean size is 10, .
are you giving different arguments to the train and val data loaders ? Any idea why the behaviour is different between train and val ?
are you giving different arguments to the train and val data loaders ? Any idea why the behaviour is different between train and val ?
They are exactly the same. I found out that during the training the shape of the a tensor is 5, while on val it equal to 2. IDK why.... For now, I made a strange workaround...
if len(a) == 2: out, mask = a extra = b if len(extra) == 2: extra, bm = extra else: out, mask = a, b
here I am applying out, mask = a, b for both cases... it is executing...
Any idea how to work with varying sizes of unordered temporal data?
Any idea how to work with varying sizes of unordered temporal data?
@VSainteuf any suggestions?