cubed
cubed copied to clipboard
Pangeo Climate Anomalies example
I tried to set up the problem of calculating the anomaly with respect to the group mean from https://github.com/pangeo-data/distributed-array-examples/issues/4. I used fake data with the same chunking scheme instead of real data to make it easier to test.
I was not expecting this plan though :grimacing:
No idea what's going on here, especially as https://github.com/tomwhite/cubed/issues/145 also has a groupby in it.
I set up the problem like this:
from datetime import datetime, timedelta
import numpy as np
time = np.arange(datetime(1979,1,1), datetime(2022,1,1), timedelta(hours=1)).astype('datetime64[ns]')
lat = np.linspace(-90.0, 90.0, 721)[::-1].astype(np.float32)
lon = np.linspace(0.0, 359.8, 1440).astype(np.float32)
def create_cubed_data(t_length):
return xr.DataArray(
name="asn",
data=cubed.array_api.astype(cubed.random.random((t_length, 721, 1440), chunks=(31, -1, -1), spec=spec), np.float32),
dims=['time', 'latitude', 'longitude'],
coords={'time': time[:t_length], 'latitude': lat, 'longitude': lon}
).to_dataset()
datasets = {
'1.5GB': create_cubed_data(372),
'15GB': create_cubed_data(3720),
'150GB': create_cubed_data(37200),
'1.5TB': create_cubed_data(372000),
}
1.5 GB dataset,
1.5e+01 GB dataset,
1.5e+02 GB dataset,
1.5e+03 GB dataset
for scale, ds in datasets.items():
print(f'{ds.nbytes / 1e9:.2} GB dataset, ')
workloads['1.5GB']['asn'].data.visualize()