argopy
argopy copied to clipboard
New method to fetch data along a given trajectory
Motivation
For some Argo data analysis it could be useful to be able to fetch data around
some specific locations, eg:
-
compare some float data with neighbor floats (any QC would do this, but also sensor dev. see eg from @matdever #168 )
-
co-localise Argo float profiles with some Huricane path (eg here from Kim Wood )
it could be useful to have an access point that directly fetch this.
API for new access point
It could look like this:
from argopy import DataFetcher as ArgoDataFetcher
# Default temporal distance is 'days' and radial distance unit is 'degree':
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], dt=365, dr=1) # All float trajectory
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], cyc=[12], dt=365, dr=1) # Single profile
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], cyc=[12,13,14], dt=365, dr=1) # Selected profiles
# Possibly distinguish zonal and meridional distances:
neighbor_fetcher = ArgoDataFetcher(ds='ref').around(wmo=[6903754], dt=30, dx=2, dy=1)
# Use option to change units:
neighbor_fetcher = ArgoDataFetcher(ds='ref').around(wmo=[6903754], dt=30, dx=100, dy=50, unit='km')
# Get data/index the classic way:
neighbor_ds = neighbor_fetcher.load().data
# argopy would add new variables to the fetched data, like distances to the requested reference profiles:
neighbor_ds['distance_time']
neighbor_ds['distance_radial']
neighbor_ds['distance_zonal']
neighbor_ds['distance_meridional']
This new access point, could in fact take a path or trajectory as input:
from argopy import DataFetcher as ArgoDataFetcher
# [2021 Hurricane Larry](https://www.nhc.noaa.gov/data/tcr/index.php?season=2021&basin=atl)
traj = [[-26.00,12.00,'2021-09-01 12:00:00'],[-33.00,13.00,'2021-09-02 12:00:00'],[-40.00,14.00,'2021-09-03 12:00:00'],[-45.00,16.00,'2021-09-04 12:00:00'],[-49.00,19.00,'2021-09-05 12:00:00'],[-52.00,21.00,'2021-09-06 12:00:00'],[-55.00,24.00,'2021-09-07 12:00:00'],[-57.00,27.00,'2021-09-08 12:00:00'],[-61.00,31.00,'2021-09-09 12:00:00'],[-61.00,38.00,'2021-09-10 12:00:00'],[-49.00,52.00,'2021-09-11 12:00:00']]
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(path=traj, dt=5, dr=50, unit='km')
Further
Note this API could easily be plugged into the argo
access point:
from argopy import DataFetcher as ArgoDataFetcher
float_fetcher = ArgoDataFetcher(ds='phy').float(6903754)
float_ds = fetcher.load().data
neighbor_ds = float_ds.argo.around(dt=365, dr=1)
neighbor_ds = float_ds.argo.around(dt=365, dx=2, dy=1)
neighbor_ds = float_ds.argo.around(dt=365, dx=100, dy=50, unit='km')
neighbor_ds = float_ds.argo.around(dt=365, dr=1, ds='ref') # Fetch data from the Argo CTD reference
@gmaze This would be a great feature and is actually the reason why issue #168 came up: I was querying the index file too often because I was looking for near-by floats along a specific float's trajectory by using:
argo = ArgoDataFetcher().profile(WMO,profnumber).to_xarray()
d = {'profile_number': argo.CYCLE_NUMBER,
'PSAL': argo.PSAL,
'TEMP': argo.TEMP,
'PRES': argo.PRES,
'time': argo.TIME,
'lon': argo.LONGITUDE,
'lat': argo.LATITUDE,
}
df = pd.DataFrame(data=d)
argo = argo_loader.region([df.lon[0]-.5, df.lon[0]+.5, df.lat[0]-.5, df.lat[0]+.5, 1500, 2100, '2020-01-01', '2022-01-01']).to_xarray()
d = {'profile_number': argo.CYCLE_NUMBER,
'PSAL': argo.PSAL,
'TEMP': argo.TEMP,
'PRES': argo.PRES,
'time': argo.TIME,
'lon': argo.LONGITUDE,
'lat': argo.LATITUDE,
'WMO': argo.PLATFORM_NUMBER
}
sbe = pd.DataFrame(data=d)
where df.lon[0] is the longitude of the float of interest at a specific profile.
The additional variables are also a great touch, I currently add the absolute distance from the reference profile using
sbe['dist'] = hs.haversine((df.lat[0],df.lon[0]),(sbe.lat[ii],sbe.lon[ii]))
Caveats
Some thoughts should be put into avoiding duplicates, otherwise the array could unnecessarily grow. Also, in my current approach, I have to manually remove the float of interest from the returned array using
sbe.drop(sbe[sbe['WMO'] == int(WMO)].index, inplace = True)
Great !
Avoiding duplicates may indeed be included 👍
I tried the following code to fetch data in a 2 years, 1x1 degree box around each profiles. Using cache is key for performances
WMO = [6903075]
profnumber = np.arange(2,10)
fetcher = ArgoDataFetcher(cache=True).profile(WMO, profnumber)
argo_ds = fetcher.to_xarray()
argo_ds = argo_ds.argo.point2profile()
fig, ax = plt.subplots(nrows=1, ncols=1)
ax.plot(argo_ds['LONGITUDE'], argo_ds['LATITUDE'], 'r+-', markersize=12)
for i_prof in argo_ds['N_PROF']:
this_prof = argo_ds.sel(N_PROF=i_prof)
lon = this_prof['LONGITUDE'].values[np.newaxis][0]
lat = this_prof['LATITUDE'].values[np.newaxis][0]
tim = this_prof['TIME'].values[np.newaxis][0]
bbox = list(np.add([lon, lon, lat, lat, 1500, 2000, tim, tim],
[-.5, .5, -.5, .5, 0, 0, -np.timedelta64(12*30, 'D'), np.timedelta64(12*30, 'D')]))
bbox[-1] = pd.to_datetime(str(bbox[-1])).strftime('%Y%m%d%H%M')
bbox[-2] = pd.to_datetime(str(bbox[-2])).strftime('%Y%m%d%H%M')
try:
fetcher = ArgoDataFetcher(cache=True, ds='phy').region(bbox)
local_ds = fetcher.load().data
local_ds = local_ds.argo.point2profile()
except Exception:
print(fetcher.uri)
pass
ax.plot(local_ds['LONGITUDE'], local_ds['LATITUDE'], 'k.')
ax.hlines(bbox[2:4], bbox[0], bbox[1], color='black', linewidth=0.5)
ax.vlines(bbox[0:2], bbox[2], bbox[3], color='black', linewidth=0.5)
ax.text(lon, lat+0.01, '%i' % this_prof['CYCLE_NUMBER'], horizontalalignment='center')
ax.grid()
ax.set_title(WMO)
Clearly, the overlapping boxes is something we should avoid
Clearly, the overlapping boxes is something we should avoid
It's maybe the time to think about a "polygon" region fetching ?
It's maybe the time to think about a "polygon" region fetching ?
Something like that is in the works for ERDDAP and should be in the next version. However, that won't help much here b/c we don't know when the argo ERDDAP server would be updated and argopy supports multiple sources of data that are non-ERDDAP. Anyway, just wanted to mention that at least this may be easier in the future.
Indeed a "polygon" selection would be great also, I think argovis has already one
However, let's imagine the case where 2 profiles are far away from each other, i.e. more precisely that boxes do not overlap:
In this case a polygon would not be able to answer the around access point purpose
A polygon selection would be useful for some application I reckon. For the problem here (i.e., fetching data along a float's trajectory), I thought that filtering out duplicates would suffice. It would solve the issue of overlapping boxes, wouldn't it? The only problem it adds is how to reference the profiles to one another (as in, one profile could be in 2 boxes).
I tried to draw it as an example to be clearer:
Float 1 is easy - no problem there Float 2 is trickier. Profiles 2 and 3 could be used for Profile C and profiles 3 and 4 for profile D. If one filters out duplicates, only profiles 2, 3, and 4 will be left, but no indication to what profiles they are matched to.
Does that make sense?
I thought that filtering out duplicates would suffice
Indeed, but at the expense of generating more data transfer than necessary, since all the post-processing/filtering would be done on the client side, I would try to avoid this
Not sure this would be much of a problem if the time range is reasonable though
I tried to draw it as an example to be clearer:
Nice draw !! 😸
but no indication to what profiles they are matched to.
This issue would be solved by the distance metric I guess For float 2, profile 4 is too far from C, and profile 2 is too from D ...
This issue was marked as staled automatically because it has not seen any activity in 90 days
This issue was marked as staled automatically because it has not seen any activity in 90 days
This issue was closed automatically because it has not seen any activity in 365 days
This issue was marked as staled automatically because it has not seen any activity in 90 days