Acessing variables from Netcdf from TEMPO satellite
I trying to read vertical_column_troposphere from TEMPO satellite using terra https://search.earthdata.nasa.gov/search?fi=TEMPO&fl=3%2B-%2BGridded%2BObservations&as[instrument][0]=TEMPO
My first try was:
test <- rast(file)
and it give me this message:
Warning message:
[rast] skipped sub-datasets (see 'describe(sds=TRUE)'):
/product/vertical_column_troposphere, /product/vertical_column_troposphere_uncertainty, /product/vertical_column_stratosphere, /product/main_data_quality_flag
, /qa_statistics/num_vertical_column_troposphere_samples, /qa_statistics/min_vertical_column_troposphere_sample, /qa_statistics/max_vertical_column_troposphere_sample
, /qa_statistics/num_vertical_column_troposphere_uncertainty_samples, /qa_statistics/min_vertical_column_troposphere_uncertainty_sample, /qa_statistics/max_vertical_column_troposphere_uncertainty_sample
, /qa_statistics/num_vertical_column_stratosphere_samples, /qa_statistics/min_vertical_column_stratosphere_sample, /qa_statistics/max_vertical_column_stratosphere_sample
, /qa_statistics/num_vertical_column_total_samples, /qa_statistics/min_vertical_column_total_sample, /qa_statistics/max_vertical_column_total_sample
, /geolocation/solar_zenith_angle, /geolocation/viewing_zenith_angle, /geolocation/relative_azim [... truncated]
The SpatRaster is fine (expected projection and extend)
class : SpatRaster
dimensions : 2950, 7750, 1 (nrow, ncol, nlyr)
resolution : 0.02, 0.02 (x, y)
extent : -168, -13, 14, 73 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
source : TEMPO_NO2_L3_V03_20250327T164955Z_S016.nc:weight
varname : weight (sum of area weights)
name : weight
unit : km^2
my next try is to read a different variable
test <- rast(file, "/product/vertical_column_troposphere")
I got this Warning message:
[rast] GDAL did not find an extent. Cells not equally spaced?
However, the resolution, extent and coord. ref. are not correct (they are suppose to be the same from previous reading)
class : SpatRaster
dimensions : 2950, 7750, 1 (nrow, ncol, nlyr)
resolution : 1, 1 (x, y)
extent : 0, 7750, 0, 2950 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
source : vertical_column_troposphere
name : vertical_column_troposphere
unit : molecules/cm^2
time : 2025-03-27 16:50:13 UTC
Is this a bug? how to I fix this?
I need to investigate this a bit more, but this is probably because the file does not follow the CF standard. This seems to be a work-around:
f <- "TEMPO_NO2_L3_V03_20250327T164955Z_S016.nc"
r <- rast(f, "weight")
x <- rast(f, "/product/vertical_column_troposphere")
ext(x) <- ext(r)
crs(x) <- crs(x)
x <- flip(x, "v")
This solution worked here, I will use instead of my workaround ;) Looks like some attribute is included in the global part ("weight" variable) but is not included (or not used) used in this variable group ("/product/vertical_column_troposphere" variable) I'm not sure if there is a way to link or pass this information directly to gdal using rast
Hi @rhijmans
There is another issue I have found using rast() to read TEMPO data, the issue happen in two different Linux clusters (and I have tested a few different versions of terra using conda):
I'm opening the variable "cloud fraction" this time (the reason is because is easy to check if the values are in a range of 0 and 1) from https://search.earthdata.nasa.gov/search/granules?p=C2930727817-LARC_CLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&fi=TEMPO&fl=3%20-%20Gridded%20Observations&as[instrument][0]=TEMPO&tl=1717799728.534!4!!
f <- "TEMPO_CLDO4_L3_V03_20240728T001157Z_S016.nc"
r <- rast(f,"/product/cloud_fraction")
# Warning message:
# [rast] GDAL did not find an extent. Cells not equally spaced?
summary(r)
# cloud_fraction
# Min. :-1.000e+30
# 1st Qu.: 9.969e+36
# Median : 9.969e+36
# Mean : 7.575e+36
# 3rd Qu.: 9.969e+36
# Max. : 9.969e+36
# NA's :187
#Warning message:
#[summary] used a sample
a similar file on my windows desktop give me the expected values:
summary(r)
# cloud_fraction
# Min. :0.00
# 1st Qu.:0.21
# Median :0.45
# Mean :0.51
# 3rd Qu.:0.84
# Max. :1.00
# NA's :76758
#Warning message:
#[summary] used a sample
Do you think is possible to pass some extra arguments from terra::rast() to GDAL that make it work?
Can you post actual urls to the files you use, is that possible? (I know it needs earthdata auth set, but that's available via .netrc or with GDAL_HTTP_HEADERS, and other ways that we can set with terra::setGDALconfig).
In general I would put these subdatasets through the warper, but with terra you can't let GDAL determine the output grid by heuristics, you have to back and forth a bit to get that. (I'm piping up because I was chatting with someone at NASA using these files and looking to work with them a bit).
oh no, sorry what I said was noise ... I see they aren't even model grids it's just messy degenerate rectlinear lon lat 1D arrays. nvm, what @rhijmans said was right, just detect their range and use that to register the extent in longlat
It's the same situation as we laid out in this pangeo post that GHRSST suffers from: https://discourse.pangeo.io/t/fixing-ghrsst-seeking-opinions/3833/7
I'll explore a bit and register this in GDAL itself as a case to detect or provide easy workarounds
I was unable to use the authorization mechs in the usual way ... I think GDAL would want a url like
/vsis3/asdc-prod-protected/TEMPO/TEMPO_CLDO4_L3_V03/2025.05.16/TEMPO_CLDO4_L3_V03_20250516T235813Z_S016.nc'
I'll report back when I figure this out. Happy to pursue specific questions if that's of interest, but I certainly will be getting some helpers to smooth this out, and relate those GDAL-perspectives on to other Python colleagues.
This works better with the new and experimental multi-dim GDAL interface.
library(terra)
#terra 1.8.53
f <- "TEMPO_NO2_L3_V03_20250526T181530Z_S010.nc"
x <- rast(f, "/product/vertical_column_troposphere", drivers="HDF5", md=TRUE)
x
#class : SpatRaster
#size : 2950, 7750, 1 (nrow, ncol, nlyr)
#dimensions : longitude, latitude, time (7750, 2950, 1}
#resolution : 0.02, 0.02 (x, y)
#extent : -168, -13, 14, 73 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
#source : TEMPO_NO2_L3_V03_20250526T181530Z_S010.nc
#varname : /product/vertical_column_troposphere (troposphere nitrogen dioxide vertical column)
#name : vertical_column_troposphere-1
#unit : molecules/cm^2
#time : 2025-05-26 18:15:48 UTC
Note the use of "driver=HDF5". For this file that is important because it speeds up reading the values (I do not know why). Sampling values is still very slow. So currently, the best way to plot these data would be to do
plot(x * 1)
That's sound great!
I'm trying to use the same command here:
x <- rast(f, "/product/vertical_column_troposphere", drivers="HDF5", md=TRUE)
but I'm got this error
Error: [rast (multi)] cannot read multidim from this file or with this driver
But now is working without 'HDF5'!!
x <- rast(f, "/product/vertical_column_troposphere", md=TRUE)
#class : SpatRaster
#size : 2950, 7750, 1 (nrow, ncol, nlyr)
#dimensions : longitude, latitude, time (7750, 2950, 1}
#resolution : 0.02, 0.02 (x, y)
#extent : -168, -13, 14, 73 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
#source : TEMPO_NO2_L3_V03_20250528T191514Z_S011.nc
#varname : /product/vertical_column_troposphere (troposphere nitrogen dioxide vertical column)
#name : vertical_column_troposphere-1
#unit : molecules/cm^2
#time : 2025-05-28 19:15:32 UTC
but my plot(x) did work in my laptop (my R session stalls), tomorrow I will try on a desktop with more memory
Do I need to update other components to make 'HDF5' work?
I did the update for the R-packages Rcpp from CRAN and last terra from github
Nice use of mdim 🙏 fwiw I found some benefit for some datasets by casting from multidim to 2D, because mdim understands relationships between groups better than 2D parts of the drivers. Implemented as vrt:// connection arg transpose for 3.12:
https://gdal.org/en/latest/drivers/raster/vrt.html#vrt-connection-string
I Just did a quick test on a computer with more memory (64GB) and Rstudio is taken a few minutes to plot(x) after using rast() with the option md=TRUE
But to plot(x) using both drivers="HDF5", md=TRUE, RStudio is stalled (I don't see a impact on memory, disks or processing)
On 28 March, @rhijmans wrote:
I need to investigate this a bit more, but this is probably because the file does not follow the CF standard.
The NO2 file is fine and follows the CF Metadata Conventions. The root group defines the coordinate variables (CVs) "longitude", "latitude" and "time" and the data variable "weight". All other data variables are located in subgroups like "product" and refer to the CVs in the root group. This is all ok. It appears that the GDAL standard netCDF driver does not find the CVs of data variables if they are not located in the same group.