xMIP icon indicating copy to clipboard operation
xMIP copied to clipboard

Improved 'filling' of missing coordinate values

Open jbusecke opened this issue 4 years ago • 4 comments

@raphaeldussin suggested some methods to fill/extrapolate coordinate values over at (https://github.com/xgcm/xgcm/issues/292#issuecomment-754751213) that might be better than the current logic used here.

This might help with #66 too.

jbusecke avatar Jan 05 '21 17:01 jbusecke

I have experimented a bit with gridfill and HCtFlood but I am not sure that the results convince me.

I loaded the data that caused problems in #66:

# Load a cmip6 dataset with missing values in lon/lat
import xarray as xr
import intake
import matplotlib.pyplot as plt
import numpy as np

from cmip6_preprocessing.preprocessing import combined_preprocessing
url = "https://cmip6.storage.googleapis.com/pangeo-cmip6.json"

col = intake.open_esm_datastore(url)
model = 'CESM2-FV2'
query = dict(experiment_id=['historical'], table_id='Omon', 
             variable_id='tos', grid_label=['gn'], source_id=model)
cat = col.search(**query)
print(cat.df['source_id'].unique())
z_kwargs = {'consolidated': True, 'decode_times':False}
tos_dict = cat.to_dataset_dict(
    zarr_kwargs=z_kwargs,
    preprocess=None,
    aggregate=False
)
ds = tos_dict['CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1.Omon.tos.gn.gs://cmip6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Omon/tos/gn/.nan.20191120']

the lon and lat fields have missing values, which I mask out as nans

lon = ds.lon
lat = ds.lat
masked_lon = lon.where(lon<1000)
masked_lat = lat.where(lat<1000)

lon_data = masked_lon.load().data
lon_data_masked = np.ma.masked_array(data=lon_data, mask=np.isnan(lon_data))
lat_data = masked_lat.load().data
lat_data_masked = np.ma.masked_array(data=lat_data, mask=np.isnan(lat_data))

plt.figure(figsize=[10,5])
plt.subplot(1,2,1)
plt.imshow(lon_data_masked)
plt.colorbar()

plt.subplot(1,2,2)
plt.imshow(lat_data_masked)
plt.colorbar()
image

HCtFlood

import HCtFlood.kara as kara
lon_kara = ds.lon
lon_kara_filled = kara.flood_kara_ma(lon_data_masked, spval=1e35)
lat_kara = ds.lat
lat_kara_filled = kara.flood_kara_ma(lat_data_masked, spval=1e35)

plt.figure(figsize=[10,5])
plt.subplot(1,2,1)
plt.imshow(lon_kara_filled)
plt.colorbar()

plt.subplot(1,2,2)
plt.imshow(np.diff(lon_kara_filled, axis=1),vmin=0,vmax=2)
plt.colorbar()
image

This doesnt look to bad visually, but you can see on the right that the longitude values are not monotonic, which causes all kind of issues with cartopy plotting

image

gridfill

import gridfill
filled_lon, converged_lon = gridfill.fill(lon_data_masked, 1, 0, 0.01, itermax=1e4, cyclic=True)
filled_lat, converged_lat = gridfill.fill(lat_data_masked, 1, 0, 0.01, itermax=1e4, cyclic=True)
print(converged_lon, converged_lat)

plt.figure(figsize=[10,5])
plt.subplot(1,2,1)
plt.imshow(filled_lon)
plt.colorbar()

plt.subplot(1,2,2)
plt.imshow(np.diff(filled_lon, axis=1),vmin=0,vmax=2)
plt.colorbar()
image

This looks great in all the areas that are not near the longitude discontinuity, but those non monotonic regions also seem to upset cartopy in a major fashion...

image

For now I am pretty sure that I do not want to make this step part of cmip6_preprocessing unless we can get this to work seemlessly with standard plotting methods.

I also have to admit I blindly used these methods without knowing much about them. If there are ways to tune the output, I would be very interested.

cc @raphaeldussin

jbusecke avatar Jan 29 '21 20:01 jbusecke

yeah that does not work great, I wonder if it would be better with a combination of xarray.bfill and xarray.ffill along the axis perpendicular to the axis that varies most (i.e. y axis for lon, x for lat)

raphaeldussin avatar Jan 29 '21 20:01 raphaeldussin

pop_tools has a lateral_fill algorithm you could try.

cc @ALDepp

dcherian avatar Jan 29 '21 20:01 dcherian

Thanks for the tips. I will definitely come back to this later!

jbusecke avatar Jan 29 '21 20:01 jbusecke