lazyarray icon indicating copy to clipboard operation
lazyarray copied to clipboard

subset lazyarray into lazyarray?

Open chrisdane opened this issue 2 years ago • 1 comments

Hi Zhengjia Wang

Thanks a lot for your great lazyarray package. I have a question (this is not an issue).

Is it possible that subsetting a lazyarray again yields a lazyarray?

I am a bit puzzled whether I use your package correctly, e.g.

# `arr` from readme.md
inds <- arr > 0.5 # error
inds <- arr[] > 0.5

During this call, arr[] fully populates the memory, i.e. the whole lazy-aspect is gone?

Thanks a lot for any answer and kind regards, Chris

chrisdane avatar Apr 06 '23 07:04 chrisdane

Hi @chrisdane , the development for this package has been paused in favor of https://github.com/dipterix/filearray , a very similar package that offers better performance and more functions. This package (lazyarray) is still on CRAN because some of my old projects are still depending on it, but soon the migration will complete. I'm sorry for the inconvenience.

Back to your question. It's not straightforward to subset lazyarray/filearray in that way for now because I'm dealing with arrays with sizes of 10GB+. Your proposed operations might need to create a new array on disk. This could very easily fill up the hard disks if not carefully treated.

It's true that once you call [, the data will be loaded into memory, hence the "lazy" aspect goes away.

What I could do, however, is I might be able to set some lazy-evaluated proxies. The proxies does not evaluate the arrays immediately. Instead, they only evaluate when you subset the arrays:

# No evaluation, inds is just a proxy array
inds <- arr > 0.5

# evaluates `arr>0.5` on the fly
inds[,,1]

# or 
arr[inds]

Does that resolve your problems?

(I'm copying this issue to filearray) I will post updates there: https://github.com/dipterix/filearray/issues/5

dipterix avatar Apr 06 '23 13:04 dipterix