pytorch-psetae icon indicating copy to clipboard operation
pytorch-psetae copied to clipboard

How to normalize data if I have dataset not for everyday?

Open manapshymyr-OB opened this issue 2 years ago • 14 comments

Can you please have a look at the last two comments on this issue https://github.com/VSainteuf/pytorch-psetae/issues/12?

I have a dataset with shapes:

(142, 8, 1048576) (159, 8, 1048576) (151, 8, 1048576)

How should I normalize such a dataset?

manapshymyr-OB avatar Dec 20 '22 02:12 manapshymyr-OB

hi @manapshymyr-OB , If your dataset has time series of varying lengths, you can normalise with statistics of shape C. So you compute the mean and standard deviation of each channel, on all dates and pixels. Cheers

VSainteuf avatar Dec 20 '22 15:12 VSainteuf

@VSainteuf in that case I will have an array with shape 1xC, right?

manapshymyr-OB avatar Dec 21 '22 00:12 manapshymyr-OB

@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std. Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?

manapshymyr-OB avatar Dec 22 '22 14:12 manapshymyr-OB

@VSainteuf in that case I will have an array with shape 1xC, right?

Yes

@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std. Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?

Yes, but you are not processing the complete image at once right ? You are supposed to crop the image with the polygons of each agricultural parcel in your AOI. Then your dataset will have shape NxTxCxS with N the number of parcels, and S the varying number of pixels in each parcel.

VSainteuf avatar Jan 12 '23 10:01 VSainteuf

@VSainteuf in that case I will have an array with shape 1xC, right?

Yes

@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std. Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?

Yes, but you are not processing the complete image at once right ? You are supposed to crop the image with the polygons of each agricultural parcel in your AOI. Then your dataset will have shape NxTxCxS with N the number of parcels, and S the varying number of pixels in each parcel.

I have the Planet dataset, which is already cropped by geometry

manapshymyr-OB avatar May 31 '23 11:05 manapshymyr-OB

OK I'm not sure what the question is anymore, let me know if you need clarification on one point!

VSainteuf avatar May 31 '23 13:05 VSainteuf

I am still confused regarding the normalization shapes. Now I have samples with different sizes in terms of the temporal, while the same for channel and pixel size (they are the same because I translated them into 1024X1024), so >> Tx10x1048576. I am trying to normalize channel-wise, and not sure about the shape of the mean array. I am concatenating all npy-s into one so it will be (Tx10x1048576) and calculate the mean for each channel. Would it be the size of (10, )? I

manapshymyr-OB avatar May 31 '23 13:05 manapshymyr-OB

yes if you have time series of varying length the best option is to compute the channelwise mean across all samples and dates. So you end up with mean and std of shape (C,)

VSainteuf avatar May 31 '23 13:05 VSainteuf

yes if you have time series of varying length the best option is to compute the channelwise mean across all samples and dates. So you end up with mean and std of shape (C,)

I am getting this error: stack expects each tensor to be equal size, but got [34, 10, 64] at entry 0 and [46, 10, 64] at entry 1. Is the reason for this error that the dataset does not have the same temporal resolution? 1.npy may have 39 X 10 X 1048576, while the 2.npy 50X 10 X 1048576. Is this fine? If not how should I process these? Should I try https://github.com/VSainteuf/utae-paps this one?

manapshymyr-OB avatar May 31 '23 17:05 manapshymyr-OB

I am encountering this error during the validation step (I would not ask this if I got this during the training too...). Error: File "/home/adminko/PycharmProjects/pytorch-psetae/models/pse.py", line 134, in masked_mean out = out * mask RuntimeError: The size of tensor a (29) must match the size of tensor b (10) at non-singleton dimension 1 So then I printed shapes and got this during the training: Out shape torch.Size([64, 87, 64]) Mask shape torch.Size([87, 64]) Out shape torch.Size([64, 87, 64]) Mask shape torch.Size([87, 64])

and during the validation: Validation . . . Out shape torch.Size([64, 29, 64]) Mask shape torch.Size([29, 10, 64]) What can be the reason? So my sample size is 29x10x1048576 and mean size is 10, .

manapshymyr-OB avatar May 31 '23 22:05 manapshymyr-OB

are you giving different arguments to the train and val data loaders ? Any idea why the behaviour is different between train and val ?

VSainteuf avatar Jun 01 '23 08:06 VSainteuf

are you giving different arguments to the train and val data loaders ? Any idea why the behaviour is different between train and val ?

They are exactly the same. I found out that during the training the shape of the a tensor is 5, while on val it equal to 2. IDK why.... For now, I made a strange workaround... if len(a) == 2: out, mask = a extra = b if len(extra) == 2: extra, bm = extra else: out, mask = a, b here I am applying out, mask = a, b for both cases... it is executing...

manapshymyr-OB avatar Jun 01 '23 08:06 manapshymyr-OB

Any idea how to work with varying sizes of unordered temporal data?

manapshymyr-OB avatar Jun 01 '23 08:06 manapshymyr-OB

Any idea how to work with varying sizes of unordered temporal data?

@VSainteuf any suggestions?

manapshymyr-OB avatar Jun 03 '23 09:06 manapshymyr-OB