terra icon indicating copy to clipboard operation
terra copied to clipboard

Acessing variables from Netcdf from TEMPO satellite

Open Schuch666 opened this issue 8 months ago • 12 comments

I trying to read vertical_column_troposphere from TEMPO satellite using terra https://search.earthdata.nasa.gov/search?fi=TEMPO&fl=3%2B-%2BGridded%2BObservations&as[instrument][0]=TEMPO

My first try was: test <- rast(file) and it give me this message:

Warning message:
[rast] skipped sub-datasets (see 'describe(sds=TRUE)'):
/product/vertical_column_troposphere, /product/vertical_column_troposphere_uncertainty, /product/vertical_column_stratosphere, /product/main_data_quality_flag
, /qa_statistics/num_vertical_column_troposphere_samples, /qa_statistics/min_vertical_column_troposphere_sample, /qa_statistics/max_vertical_column_troposphere_sample
, /qa_statistics/num_vertical_column_troposphere_uncertainty_samples, /qa_statistics/min_vertical_column_troposphere_uncertainty_sample, /qa_statistics/max_vertical_column_troposphere_uncertainty_sample
, /qa_statistics/num_vertical_column_stratosphere_samples, /qa_statistics/min_vertical_column_stratosphere_sample, /qa_statistics/max_vertical_column_stratosphere_sample
, /qa_statistics/num_vertical_column_total_samples, /qa_statistics/min_vertical_column_total_sample, /qa_statistics/max_vertical_column_total_sample
, /geolocation/solar_zenith_angle, /geolocation/viewing_zenith_angle, /geolocation/relative_azim [... truncated] 

The SpatRaster is fine (expected projection and extend)

class       : SpatRaster 
dimensions  : 2950, 7750, 1  (nrow, ncol, nlyr)
resolution  : 0.02, 0.02  (x, y)
extent      : -168, -13, 14, 73  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
source      : TEMPO_NO2_L3_V03_20250327T164955Z_S016.nc:weight 
varname     : weight (sum of area weights) 
name        : weight 
unit        :   km^2

my next try is to read a different variable test <- rast(file, "/product/vertical_column_troposphere") I got this Warning message: [rast] GDAL did not find an extent. Cells not equally spaced?

However, the resolution, extent and coord. ref. are not correct (they are suppose to be the same from previous reading)

class       : SpatRaster 
dimensions  : 2950, 7750, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 7750, 0, 2950  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
source      : vertical_column_troposphere 
name        : vertical_column_troposphere 
unit        :              molecules/cm^2 
time        : 2025-03-27 16:50:13 UTC 

Is this a bug? how to I fix this?

Schuch666 avatar Mar 28 '25 16:03 Schuch666

I need to investigate this a bit more, but this is probably because the file does not follow the CF standard. This seems to be a work-around:

f <- "TEMPO_NO2_L3_V03_20250327T164955Z_S016.nc"
r <- rast(f, "weight")

x <- rast(f, "/product/vertical_column_troposphere")
ext(x) <- ext(r)
crs(x) <- crs(x)
x <- flip(x, "v")

rhijmans avatar Mar 28 '25 21:03 rhijmans

This solution worked here, I will use instead of my workaround ;) Looks like some attribute is included in the global part ("weight" variable) but is not included (or not used) used in this variable group ("/product/vertical_column_troposphere" variable) I'm not sure if there is a way to link or pass this information directly to gdal using rast

Schuch666 avatar Mar 31 '25 14:03 Schuch666

Hi @rhijmans

There is another issue I have found using rast() to read TEMPO data, the issue happen in two different Linux clusters (and I have tested a few different versions of terra using conda):

I'm opening the variable "cloud fraction" this time (the reason is because is easy to check if the values are in a range of 0 and 1) from https://search.earthdata.nasa.gov/search/granules?p=C2930727817-LARC_CLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&fi=TEMPO&fl=3%20-%20Gridded%20Observations&as[instrument][0]=TEMPO&tl=1717799728.534!4!!

f <- "TEMPO_CLDO4_L3_V03_20240728T001157Z_S016.nc"
r <- rast(f,"/product/cloud_fraction")
# Warning message:
# [rast] GDAL did not find an extent. Cells not equally spaced?
summary(r)
# cloud_fraction
# Min.   :-1.000e+30
# 1st Qu.: 9.969e+36
# Median : 9.969e+36
# Mean   : 7.575e+36
# 3rd Qu.: 9.969e+36
# Max.   : 9.969e+36
# NA's   :187
#Warning message:
#[summary] used a sample 

a similar file on my windows desktop give me the expected values:

summary(r)
# cloud_fraction 
# Min.   :0.00   
# 1st Qu.:0.21   
# Median :0.45   
# Mean   :0.51   
# 3rd Qu.:0.84   
# Max.   :1.00   
# NA's   :76758  
#Warning message:
#[summary] used a sample

Do you think is possible to pass some extra arguments from terra::rast() to GDAL that make it work?

Schuch666 avatar Apr 15 '25 21:04 Schuch666

Can you post actual urls to the files you use, is that possible? (I know it needs earthdata auth set, but that's available via .netrc or with GDAL_HTTP_HEADERS, and other ways that we can set with terra::setGDALconfig).

In general I would put these subdatasets through the warper, but with terra you can't let GDAL determine the output grid by heuristics, you have to back and forth a bit to get that. (I'm piping up because I was chatting with someone at NASA using these files and looking to work with them a bit).

mdsumner avatar May 18 '25 07:05 mdsumner

oh no, sorry what I said was noise ... I see they aren't even model grids it's just messy degenerate rectlinear lon lat 1D arrays. nvm, what @rhijmans said was right, just detect their range and use that to register the extent in longlat

It's the same situation as we laid out in this pangeo post that GHRSST suffers from: https://discourse.pangeo.io/t/fixing-ghrsst-seeking-opinions/3833/7

I'll explore a bit and register this in GDAL itself as a case to detect or provide easy workarounds

mdsumner avatar May 18 '25 07:05 mdsumner

I was unable to use the authorization mechs in the usual way ... I think GDAL would want a url like

/vsis3/asdc-prod-protected/TEMPO/TEMPO_CLDO4_L3_V03/2025.05.16/TEMPO_CLDO4_L3_V03_20250516T235813Z_S016.nc'

I'll report back when I figure this out. Happy to pursue specific questions if that's of interest, but I certainly will be getting some helpers to smooth this out, and relate those GDAL-perspectives on to other Python colleagues.

mdsumner avatar May 18 '25 09:05 mdsumner

This works better with the new and experimental multi-dim GDAL interface.

library(terra)
#terra 1.8.53
f <- "TEMPO_NO2_L3_V03_20250526T181530Z_S010.nc"     
x <- rast(f, "/product/vertical_column_troposphere", drivers="HDF5", md=TRUE)
x
#class       : SpatRaster 
#size        : 2950, 7750, 1  (nrow, ncol, nlyr)
#dimensions  : longitude, latitude, time (7750, 2950, 1}
#resolution  : 0.02, 0.02  (x, y)
#extent      : -168, -13, 14, 73  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source      : TEMPO_NO2_L3_V03_20250526T181530Z_S010.nc 
#varname     : /product/vertical_column_troposphere (troposphere nitrogen dioxide vertical column) 
#name        : vertical_column_troposphere-1 
#unit        :                molecules/cm^2 
#time        : 2025-05-26 18:15:48 UTC 

Note the use of "driver=HDF5". For this file that is important because it speeds up reading the values (I do not know why). Sampling values is still very slow. So currently, the best way to plot these data would be to do

plot(x * 1)

rhijmans avatar May 28 '25 18:05 rhijmans

That's sound great!

I'm trying to use the same command here: x <- rast(f, "/product/vertical_column_troposphere", drivers="HDF5", md=TRUE) but I'm got this error Error: [rast (multi)] cannot read multidim from this file or with this driver

But now is working without 'HDF5'!!

x <- rast(f, "/product/vertical_column_troposphere", md=TRUE)
#class       : SpatRaster 
#size        : 2950, 7750, 1  (nrow, ncol, nlyr)
#dimensions  : longitude, latitude, time (7750, 2950, 1}
#resolution  : 0.02, 0.02  (x, y)
#extent      : -168, -13, 14, 73  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source      : TEMPO_NO2_L3_V03_20250528T191514Z_S011.nc 
#varname     : /product/vertical_column_troposphere (troposphere nitrogen dioxide vertical column) 
#name        : vertical_column_troposphere-1 
#unit        :                molecules/cm^2 
#time        : 2025-05-28 19:15:32 UTC

but my plot(x) did work in my laptop (my R session stalls), tomorrow I will try on a desktop with more memory

Schuch666 avatar May 29 '25 02:05 Schuch666

Do I need to update other components to make 'HDF5' work? I did the update for the R-packages Rcpp from CRAN and last terra from github

Schuch666 avatar May 29 '25 02:05 Schuch666

Nice use of mdim 🙏 fwiw I found some benefit for some datasets by casting from multidim to 2D, because mdim understands relationships between groups better than 2D parts of the drivers. Implemented as vrt:// connection arg transpose for 3.12:

https://gdal.org/en/latest/drivers/raster/vrt.html#vrt-connection-string

mdsumner avatar May 29 '25 02:05 mdsumner

I Just did a quick test on a computer with more memory (64GB) and Rstudio is taken a few minutes to plot(x) after using rast() with the option md=TRUE

But to plot(x) using both drivers="HDF5", md=TRUE, RStudio is stalled (I don't see a impact on memory, disks or processing)

Schuch666 avatar Jun 03 '25 22:06 Schuch666

On 28 March, @rhijmans wrote:

I need to investigate this a bit more, but this is probably because the file does not follow the CF standard.

The NO2 file is fine and follows the CF Metadata Conventions. The root group defines the coordinate variables (CVs) "longitude", "latitude" and "time" and the data variable "weight". All other data variables are located in subgroups like "product" and refer to the CVs in the root group. This is all ok. It appears that the GDAL standard netCDF driver does not find the CVs of data variables if they are not located in the same group.

pvanlaake avatar Aug 14 '25 09:08 pvanlaake