stars icon indicating copy to clipboard operation
stars copied to clipboard

stars_proxy memory hog

Open dazu89 opened this issue 1 year ago • 1 comments

Intending to build a high-dimensional data cube from raster files in plain text ASCII grid format I read all files' meta data (file path and attributes) into a data frame (1), group by dimensions and concatenate files in each group into a stars_proxy (2) to then summarize/concantenate the stars_proxys into a higher dimensional star_proxy (3), similar to the process described in this post on StackExchange or this Github issue.

Upon loading the star_proxy via my_star_proxy |> st_as_stars() the memory usage ascends into 10s of GB even if only a couple of files with file size of 5-10 MB are read. The problem only occurs with files of the following format

ncols                   500
nrows                  500
xllcorner              6.5
yllcorner              -65.5
cellsize                 0.002
NODATA_value            -9.9990E+03
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03  0.5000E-02  1.5000E+02 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
.			.			.			.			.			.
.			.			.			.			.			 .
.			.			.			.			.			  .

whereas with standard data no such problem occurs and only a couple 100 MB are used.

library(stars)
library(profmem)
options(profmem.threshold = 1e6)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
rs_mem = read_stars(tif)
print(object.size(rs_mem), standard = "SI", units = 'auto')
r = read_stars(list(a = c(tif,tif), b = c(tif, tif)), proxy = TRUE)
(xx = st_redimension(r, along = list(foo = 1:4)))
(rr = c(xx, xx))
(rrr = st_redimension(rr, along = list(bar = as.Date(c("2001-01-01", "2002-01-01")))))
p <- profmem({
  test = rrr |> st_as_stars()
})
sum(p$bytes, na.rm=TRUE) / 1e6

I suspect, I should supply some options to the read_stars routine but so far have not good guess.

dazu89 avatar Sep 01 '24 22:09 dazu89