cubed icon indicating copy to clipboard operation
cubed copied to clipboard

Pangeo Climate Anomalies example

Open TomNicholas opened this issue 1 year ago • 9 comments

I tried to set up the problem of calculating the anomaly with respect to the group mean from https://github.com/pangeo-data/distributed-array-examples/issues/4. I used fake data with the same chunking scheme instead of real data to make it easier to test.

I was not expecting this plan though :grimacing:

image

No idea what's going on here, especially as https://github.com/tomwhite/cubed/issues/145 also has a groupby in it.


I set up the problem like this:

from datetime import datetime, timedelta
import numpy as np
time = np.arange(datetime(1979,1,1), datetime(2022,1,1), timedelta(hours=1)).astype('datetime64[ns]')
lat = np.linspace(-90.0, 90.0, 721)[::-1].astype(np.float32)
lon = np.linspace(0.0, 359.8, 1440).astype(np.float32)
def create_cubed_data(t_length):
  
    return xr.DataArray(
        name="asn",
        data=cubed.array_api.astype(cubed.random.random((t_length, 721, 1440), chunks=(31, -1, -1), spec=spec), np.float32),
        dims=['time', 'latitude', 'longitude'],
        coords={'time': time[:t_length], 'latitude': lat, 'longitude': lon}
    ).to_dataset()
datasets = {
    '1.5GB': create_cubed_data(372),
    '15GB':  create_cubed_data(3720),
    '150GB': create_cubed_data(37200),
    '1.5TB': create_cubed_data(372000),
}
1.5 GB dataset, 
1.5e+01 GB dataset, 
1.5e+02 GB dataset, 
1.5e+03 GB dataset
for scale, ds in datasets.items():
    print(f'{ds.nbytes / 1e9:.2} GB dataset, ')
workloads['1.5GB']['asn'].data.visualize()

TomNicholas avatar Jun 22 '23 17:06 TomNicholas