read_mdim cannot find array but cli gdalmdiminfo finds array just fine?
The following fails with error:
public_S3_url <- "https://storage.googleapis.com/neon-int-sae-tmp-files/ods/dataproducts/DP4/2020-04-26/GRSM/NEON.D07.GRSM.DP4.00200.001.nsae.2020-04-26.expanded.20230921T200400Z.h5"
co2 <- paste0("/vsicurl/", public_S3_url) |> stars::read_mdim("/GRSM/dp04/data/fluxCo2/nsae")
Error: Cannot find array
But the same array is easily located by gdalmdiminfo, so I don't understand why this error occurs:
system(paste("gdalmdiminfo", paste0("/vsicurl/", public_S3_url), "-array", "/GRSM/dp04/data/fluxCo2/nsae"))
Testing using stars 0.6.4 with libraries:
Linking to GEOS 3.12.0, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
There's some pretty heavy nested grouping going on in that file!
There's some pretty heavy nested grouping going on in that file!
:rofl: haha so true. so many ridiculous examples of nesting in hdf5, I sometimes think they intentionally serialize data in the worst way possible. (but then I remember that this is the pre-release format -- the current format is then manually .gz compressed to an h5.gz file). For context, these esoteric creatures are perhaps the flagship data product (eddy covariance measurements from flux towers) of the 81 NEON sites that our NSF is spending nearly half a billion dollars on... https://data.neonscience.org/data-products/DP4.00200.001
I can get it to work with a hack that passes on the group nesting, giving something like
> stars::read_mdim("/vsicurl/https://storage.googleapis.com/neon-int-sae-tmp-files/ods/dataproducts/DP4/2020-04-26/GRSM/NEON.D07.GRSM.DP4.00200.001.nsae.2020-04-26.expanded.20230921T200400Z.h5", "nsae", groups = c("GRSM", "dp04", "data", "fluxCo2"))
Read failed for array nsae
stars object with 1 dimensions and 1 attribute
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max.
nsae 0 0 0 0 0 0
dimension(s):
from to offset delta
dim0 1 48 0.5 1
Warning message:
In CPL_read_mdim(file, array_name, groups, options, offset, count, :
GDAL Error 1: Array data type is not convertible to buffer data type
indicating that it gets to the array, opens it, but cannot read it.
If you look at the actual array values of the array, e.g. with
gdalmdiminfo -detailed /vsicurl/https://storage.googleapis.com/neon-int-sae-tmp-files/ods/dataproducts/DP4/2020-04-26/GRSM/NEON.D07.GRSM.DP4.00200.001.nsae.2020-04-26.expanded.20230921T200400Z.h5 -array /GRSM/dp04/data/fluxCo2/nsae
I think it may become clear why this may be (and remain?) a problem: the array values are database tuples.
Have you tried reading these data with some hdf5 package (h5?) directly?
These commits are probably going to be replaced by the way gdalmdiminfo does this.
Have you tried reading these data with some hdf5 package (h5?) directly?
rhdf5 will subset this file remotely rather well. Couldn't get this to work with the others.
public_S3_url <- "https://storage.googleapis.com/neon-int-sae-tmp-files/ods/dataproducts/DP4/2020-04-26/GRSM/NEON.D07.GRSM.DP4.00200.001.nsae.2020-04-26.expanded.20230921T200400Z.h5"
#listGrp <- rhdf5::h5ls(file = public_S3_url, s3 = TRUE) # slow!
# accessing a given array is fast though
nee <- rhdf5::h5read(file = public_S3_url,
name = glue::glue("{site}/dp04/data/fluxCo2/turb",
site="GRSM"),
s3 = TRUE)
(rhdf5 can't handle the stupid case where these are gzipped, I know /vsigzip/ can be slow but it works here. still you're probably right that gdal mdiminfo isn't ideal for this!)
In other news, I'd love to see proxy support for stars::read_mdim, e.g. for large Zarr archives, e.g. so we can do R examples like https://planetarycomputer.microsoft.com/dataset/daymet-daily-na#Example-Notebook
In other news, I'd love to see proxy support for stars::read_mdim, e.g. for large Zarr archives
what would you like to see beyond the ability to read one or more slices, or sub-cubes (currently in offset, count and step)?
what would you like to see beyond the ability to read one or more slices, or sub-cubes (currently in offset, count and step)?
Thanks, and apologies as I should have probably opened an issue or stackoverflow post as this could just be my lack of understanding.
One part of this is just ergonomics. I imagine I can crop spatial extents by setting the appropriate combination of count and offset, and downsample with slice, but this doesn't match the syntax we have elsewhere in stars to do things like crop, right? What about things like aggregation? In python, it appears the xarray syntax for all these operations remains entirely consistent whether we're reading in from Zarr, reading netcdf, or even reading COGs from stac.
I think my idealized workflow would actually look something like:
library(spData)
box <- us_states[us_states$NAME=="California",] |> st_transform(4326)
library(stars)
url <- "https://daymeteuwest.blob.core.windows.net/daymet-zarr/monthly/na.zarr"
# desired but not functional syntax:
x <- read_stars(url) |> st_crop(ca)
and have a lazy-read object where I can slice times (or other dimensions) on indices. Perhaps my expectations here are just entirely unreasonable. Yes, I know we currently need the GDAL decorators to even get this to parse:
dsn <- glue::glue('ZARR:"/vsicurl/{url}":/tmax')
tmax <- read_stars(dsn)
x <- read_stars(url)
This works, but already confuses my students -- this isn't the way I previously taught them to specify dimensions (tmax) in read_stars, and the prefix/quotation syntax causes trouble. I understand that mdim is also the preferred interface for Zarr, but the syntax of offset / count / slice gets progressively harder to anticipate, and again differs from the way these concepts were introduced previously in stars objects:
x <- read_mdim(dsn, count = c(NA, NA, 492), step=c(4, 4, 12))
the ergonomics are again hard here because we hardwire into code numerical values that could be obtained from object metadata but are otherwise not immediately obvious to the user. But this also gives me a bunch of warnings:
x <- read_mdim(dsn, count = c(NA, NA, 492), step=c(4, 4, 12))
Read failed for array prcp
Read failed for array swe
Read failed for array tmax
Read failed for array tmin
Read failed for array vp
Warning messages:
1: In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 1: arrayStartIdx[0] + (count[0]-1) * arrayStep[0] >= 492
2: In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 1: arrayStartIdx[0] + (count[0]-1) * arrayStep[0] >= 492
3: In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 1: arrayStartIdx[0] + (count[0]-1) * arrayStep[0] >= 492
4: In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 1: arrayStartIdx[0] + (count[0]-1) * arrayStep[0] >= 492
5: In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 1: arrayStartIdx[0] + (count[0]-1) * arrayStep[0] >= 492
6: In create_dimension(values = x, is_raster = !sf && i %in% 1:2, point = ifelse(length(x) == :
dimension value(s) non-finite
and seems to have an invalid crs which I can't override:
> st_crs(x) <- 4326
Error in st_crs.character(x[[xy[1]]]$refsys) : invalid crs: udunits
and on attempting to plot it crashes on a platform with 100 GB RAM.
This gives us
> stars::read_mdim("NEON.D07.GRSM.DP4.00200.001.nsae.2020-04-26.expanded.20230921T200400Z.h5", "/GRSM/dp04/data/fluxCo2/nsae")
timeBgn timeEnd flux
1 2020-04-26T00:00:00.000Z 2020-04-26T00:29:59.000Z NaN
2 2020-04-26T00:30:00.000Z 2020-04-26T00:59:59.000Z -0.7405883
3 2020-04-26T01:00:00.000Z 2020-04-26T01:29:59.000Z -0.2856589
4 2020-04-26T01:30:00.000Z 2020-04-26T01:59:59.000Z 2.7275024
5 2020-04-26T02:00:00.000Z 2020-04-26T02:29:59.000Z 14.2427594
6 2020-04-26T02:30:00.000Z 2020-04-26T02:59:59.000Z -4.1640435
7 2020-04-26T03:00:00.000Z 2020-04-26T03:29:59.000Z -23.5961781
8 2020-04-26T03:30:00.000Z 2020-04-26T03:59:59.000Z -38.7475553
9 2020-04-26T04:00:00.000Z 2020-04-26T04:29:59.000Z -5.9633267
10 2020-04-26T04:30:00.000Z 2020-04-26T04:59:59.000Z -1.9345659
11 2020-04-26T05:00:00.000Z 2020-04-26T05:29:59.000Z 5.7533219
12 2020-04-26T05:30:00.000Z 2020-04-26T05:59:59.000Z 3.6831897
13 2020-04-26T06:00:00.000Z 2020-04-26T06:29:59.000Z -5.8330796
14 2020-04-26T06:30:00.000Z 2020-04-26T06:59:59.000Z 2.9189290
15 2020-04-26T07:00:00.000Z 2020-04-26T07:29:59.000Z 3.0636229
16 2020-04-26T07:30:00.000Z 2020-04-26T07:59:59.000Z 3.0182718
17 2020-04-26T08:00:00.000Z 2020-04-26T08:29:59.000Z 2.9283239
18 2020-04-26T08:30:00.000Z 2020-04-26T08:59:59.000Z -3.0542519
19 2020-04-26T09:00:00.000Z 2020-04-26T09:29:59.000Z 5.5374485
20 2020-04-26T09:30:00.000Z 2020-04-26T09:59:59.000Z 0.2513057
21 2020-04-26T10:00:00.000Z 2020-04-26T10:29:59.000Z 0.6168280
22 2020-04-26T10:30:00.000Z 2020-04-26T10:59:59.000Z -5.4693606
23 2020-04-26T11:00:00.000Z 2020-04-26T11:29:59.000Z 0.3674658
24 2020-04-26T11:30:00.000Z 2020-04-26T11:59:59.000Z 1.4159392
25 2020-04-26T12:00:00.000Z 2020-04-26T12:29:59.000Z -1.0238224
26 2020-04-26T12:30:00.000Z 2020-04-26T12:59:59.000Z 0.7054844
27 2020-04-26T13:00:00.000Z 2020-04-26T13:29:59.000Z -8.9968282
28 2020-04-26T13:30:00.000Z 2020-04-26T13:59:59.000Z 12.3038478
29 2020-04-26T14:00:00.000Z 2020-04-26T14:29:59.000Z -3.2665178
30 2020-04-26T14:30:00.000Z 2020-04-26T14:59:59.000Z -3.2718385
31 2020-04-26T15:00:00.000Z 2020-04-26T15:29:59.000Z -10.0929240
32 2020-04-26T15:30:00.000Z 2020-04-26T15:59:59.000Z -5.9556013
33 2020-04-26T16:00:00.000Z 2020-04-26T16:29:59.000Z -8.2911191
34 2020-04-26T16:30:00.000Z 2020-04-26T16:59:59.000Z -5.0356033
35 2020-04-26T17:00:00.000Z 2020-04-26T17:29:59.000Z -8.3811585
36 2020-04-26T17:30:00.000Z 2020-04-26T17:59:59.000Z -1.6936340
37 2020-04-26T18:00:00.000Z 2020-04-26T18:29:59.000Z 14.2182930
38 2020-04-26T18:30:00.000Z 2020-04-26T18:59:59.000Z -10.0673130
39 2020-04-26T19:00:00.000Z 2020-04-26T19:29:59.000Z -1.8299035
40 2020-04-26T19:30:00.000Z 2020-04-26T19:59:59.000Z -0.7272567
41 2020-04-26T20:00:00.000Z 2020-04-26T20:29:59.000Z 8.0110202
42 2020-04-26T20:30:00.000Z 2020-04-26T20:59:59.000Z NaN
43 2020-04-26T21:00:00.000Z 2020-04-26T21:29:59.000Z 12.1214397
44 2020-04-26T21:30:00.000Z 2020-04-26T21:59:59.000Z -2.1407618
45 2020-04-26T22:00:00.000Z 2020-04-26T22:29:59.000Z -65.3029157
46 2020-04-26T22:30:00.000Z 2020-04-26T22:59:59.000Z 0.4240327
47 2020-04-26T23:00:00.000Z 2020-04-26T23:29:59.000Z -0.8366531
48 2020-04-26T23:30:00.000Z 2020-04-26T23:59:59.000Z -12.9685483
although it still needs a fix if the numeric data is not Float64.
I'll come back to your other comments - I agree, interfaces and usability are important.
This hides the offset, count and step arguments behind st_crop() or [ calls, giving for example
library(sf)
# Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(spData)
# CRS from https://daac.ornl.gov/DAYMET/guides/Daymet_V4_Monthly_Climatology.html:
lcc = st_crs("+proj=lcc +lat_1=25 +lat_2=60 +lat_0=42.5 +lon_0=-100 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
ca <- us_states[us_states$NAME=="California",] |> st_transform(lcc)
library(stars)
# Loading required package: abind
url <- "https://daymeteuwest.blob.core.windows.net/daymet-zarr/monthly/na.zarr"
dsn <- glue::glue('ZARR:"/vsicurl/{url}"') # all arrays
(p <- read_mdim(dsn, proxy = TRUE)) # read only array names and dimensions
# stars_proxy object with 5 attributes in 5 file(s):
# $prcp
# [1] "[...]/na.zarr"
# $swe
# [1] "[...]/na.zarr"
# $tmax
# [1] "[...]/na.zarr"
# $tmin
# [1] "[...]/na.zarr"
# $vp
# [1] "[...]/na.zarr"
# dimension(s):
# from to offset delta refsys values x/y
# x 1 7814 -4560750 1000 NA NULL [x]
# y 1 8075 4984500 -1000 NA NULL [y]
# time 1 492 NA NA POSIXct 1980-01-16 12:00:00,...,2020-12-16
p["tmax",,,1:20] |> # select one array and 20 time steps
st_set_crs(lcc) |> # p doesn't have a CRS; set it to that of ca
st_crop(ca) -> crp # crop the ca area
hook = function() plot(st_geometry(ca), col = NA, border = 'orange', add = TRUE) # plot border
plot(crp, hook = hook)
# downsample set to 7
Hi,
I am running into an error on the GDAL-side with latest sf and stars installed trying to run the example above:
#> Error 1: Decompressor blosc not handled
A quick search revealed the same error message earlier in two pangeo threads (https://github.com/pangeo-data/pangeo/issues/196 and https://github.com/pangeo-forge/pangeo-forge-recipes/issues/58), but it is still obscure to me what is causing this.
repex
remotes::install_github("r-spatial/sf")
#> Skipping install of 'sf' from a github remote, the SHA1 (6e13be91) has not changed since last install.
#> Use `force = TRUE` to force installation
remotes::install_github("r-spatial/stars")
#> Skipping install of 'stars' from a github remote, the SHA1 (b1be4573) has not changed since last install.
#> Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.12.2, GDAL 3.9.1, PROJ 9.4.1; sf_use_s2() is TRUE
library(spData)
lcc = st_crs("+proj=lcc +lat_1=25 +lat_2=60 +lat_0=42.5 +lon_0=-100 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
ca <- us_states[us_states$NAME=="California",] |> st_transform(lcc)
library(stars)
#> Loading required package: abind
url <- "https://daymeteuwest.blob.core.windows.net/daymet-zarr/monthly/na.zarr"
dsn <- glue::glue('ZARR:"/vsicurl/{url}"') # all arrays
(p <- read_mdim(dsn, proxy = TRUE)) # read only array names and dimensions
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Error in which(sapply(x, function(i) inherits(i$values, "sfc"))): argument to 'which' is not logical
p["tmax",,,1:20] |> # select one array and 20 time steps
st_set_crs(lcc) |> # p doesn't have a CRS; set it to that of ca
st_crop(ca) -> crp # crop the ca area
#> Error in eval(expr, envir, enclos): object 'p' not found
hook = function() plot(st_geometry(ca), col = NA, border = 'orange', add = TRUE) # plot border
plot(crp, hook = hook)
#> Error in eval(expr, envir, enclos): object 'crp' not found
Created on 2024-08-12 with reprex v2.1.0
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 22.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-08-12
#> pandoc 3.2 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind * 1.4-5 2016-07-21 [1] CRAN (R 4.4.1)
#> class 7.3-22 2023-05-03 [2] CRAN (R 4.4.1)
#> classInt 0.4-10 2023-09-05 [1] CRAN (R 4.4.1)
#> cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
#> curl 5.2.1 2024-03-01 [1] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [1] RSPM (R 4.4.0)
#> digest 0.6.36 2024-06-23 [1] RSPM (R 4.4.0)
#> e1071 1.7-14 2023-12-06 [1] CRAN (R 4.4.1)
#> evaluate 0.24.0 2024-06-10 [1] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] RSPM (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#> KernSmooth 2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#> knitr 1.48 2024-07-07 [1] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.1)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.0)
#> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.4.1)
#> purrr 1.0.2 2023-08-10 [1] RSPM (R 4.4.0)
#> R.cache 0.16.0 2022-07-21 [1] RSPM (R 4.4.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] RSPM (R 4.4.0)
#> R.oo 1.26.0 2024-01-24 [1] RSPM (R 4.4.0)
#> R.utils 2.12.3 2023-11-18 [1] RSPM (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [1] RSPM (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [1] RSPM (R 4.4.0)
#> reprex 2.1.0 2024-01-11 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] RSPM (R 4.4.0)
#> rmarkdown 2.27 2024-05-17 [1] RSPM (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] RSPM (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.4.0)
#> sf * 1.0-17 2024-08-12 [1] Github (r-spatial/sf@6e13be9)
#> sp 2.1-4 2024-04-30 [1] CRAN (R 4.4.1)
#> spData * 2.3.1 2024-05-31 [1] RSPM (R 4.4.0)
#> spDataLarge 2.1.1 2024-08-12 [1] local
#> stars * 0.6-7 2024-08-12 [1] Github (r-spatial/stars@b1be457)
#> styler 1.10.3 2024-04-07 [1] RSPM (R 4.4.0)
#> units 0.8-5 2023-11-28 [1] CRAN (R 4.4.1)
#> vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0)
#> withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
#> xfun 0.46 2024-07-18 [1] RSPM (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [1] RSPM (R 4.4.0)
#>
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
@goergen95 #564 and #663 are related.
This is really awesome! I am currently trying to replicate some analysis with vector data cubes done by folks at earthmover in a recent blog post using xarray. I still face some challenges with interpreting ERA5 longitude values (between 0-360°) and st_extract() does not seem to work as efficiently as one might hope when comparing to the time it takes to generate the plot (which is amazingly fast).
remotes::install_github("r-spatial/stars", ref = "b1be457")
#> Skipping install of 'stars' from a github remote, the SHA1 (b1be4573) has not changed since last install.
#> Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.12.2, GDAL 3.9.1, PROJ 9.4.1; sf_use_s2() is TRUE
library(stars)
#> Loading required package: abind
library(viridis)
#> Loading required package: viridisLite
# read zarr
url <- "https://storage.googleapis.com/weatherbench2/datasets/era5/1959-2022-full_37-6h-0p25deg-chunk-1.zarr-v2/"
dsn <- glue::glue('ZARR:"/vsicurl/{url}"')
era5_ds <- read_mdim(dsn, proxy = TRUE, variable = c("2m_temperature", "10m_u_component_of_wind"))
# see https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference
era5_ds <- st_set_dimensions(era5_ds, "longitude", values = st_get_dimension_values(era5_ds, "longitude") - 180)
# subset
start <- as.POSIXct("2017-01-01 00:00:00 UTC")
end <- as.POSIXct("2018-02-01 00:00:00 UTC")
time <- st_get_dimension_values(era5_ds, "time")
(era5_ds_sub <- era5_ds[,,,which(time >= start & time < end)])
#> stars_proxy object with 2 attributes in 2 file(s):
#> $`2m_temperature`
#> [1] "[...]/"
#>
#> $`10m_u_component_of_wind`
#> [1] "[...]/"
#>
#> dimension(s):
#> from to offset delta refsys x/y
#> longitude 1 1440 -180 0.25 NA [x]
#> latitude 1 721 90.12 -0.25 NA [y]
#> time 84741 86324 1959-01-01 UTC 6 hours POSIXct
# plot
era5_ds_sub["2m_temperature",,,1583, drop = TRUE] |> # does not drop time
plot(axes = TRUE, key.pos = 4, col = viridis_pal()) # how to put time as title?
#> Warning in st_is_longlat(x): bounding box has potentially an invalid value
#> range for longlat data
#> Warning in st_is_longlat(x): bounding box has potentially an invalid value
#> range for longlat data

# cities
parquet <- "https://huggingface.co/api/datasets/jamescalam/world-cities-geo/parquet/default/train/0.parquet"
cities <- read_sf(parquet)
cities <- st_as_sf(cities,coords = c("longitude", "latitude"), crs = "EPSG:4326")
(cities_eur <- subset(cities, continent == "Europe"))
#> Simple feature collection with 2844 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -25.66667 ymin: 28 xmax: 39.74 ymax: 67.85572
#> Geodetic CRS: WGS 84
#> # A tibble: 2,844 × 8
#> city country region continent x y z geometry
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <POINT [°]>
#> 1 Tirana Albania South… Europe 4501. 1622. 4207. (19.81889 41.3275)
#> 2 Durres Albania South… Europe 4512. 1593. 4207. (19.44139 41.32306)
#> 3 Elbasan Albania South… Europe 4508. 1648. 4189. (20.08222 41.1125)
#> 4 Vlore Albania South… Europe 4569. 1617. 4135. (19.48972 40.46667)
#> 5 Shkoder Albania South… Europe 4458. 1580. 4269. (19.51258 42.06828)
#> 6 Fier-Ci… Albania South… Europe 4550. 1617. 4156. (19.56667 40.71667)
#> 7 Korce Albania South… Europe 4521. 1716. 4148. (20.78083 40.61861)
#> 8 Berat Albania South… Europe 4540. 1648. 4155. (19.95222 40.70583)
#> 9 Lushnje Albania South… Europe 4531. 1623. 4175. (19.705 40.94194)
#> 10 Kavaje Albania South… Europe 4518. 1605. 4195. (19.55694 41.18556)
#> # ℹ 2,834 more rows
# extraction does not seem to work efficiently yet
era5_ds_sub <- st_set_crs(era5_ds_sub, st_crs(cities_eur))
(era5_ds_sub_eur <- st_crop(era5_ds_sub, cities_eur))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $`2m_temperature`
#> [1] "[...]/"
#>
#> $`10m_u_component_of_wind`
#> [1] "[...]/"
#>
#> dimension(s):
#> from to offset delta refsys x/y
#> longitude 618 879 -180 0.25 WGS 84 [x]
#> latitude 90 249 90.12 -0.25 WGS 84 [y]
#> time 84741 86324 1959-01-01 UTC 6 hours POSIXct
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x5d1d5ce5c118>
#>
#> This object has pending lazy operations: dimensions as printed may not reflect this.
# era5_cities_eur <- st_extract(era5_ds_sub_eur[,,,1], cities_eur[1,])
Created on 2024-08-14 with reprex v2.1.0