datatable icon indicating copy to clipboard operation
datatable copied to clipboard

force lazy evaluation

Open jangorecki opened this issue 5 years ago • 1 comments

This is follow up of our slack conversation. FR is about providing API that allows to force materialize results of computations which might have not been materialized yet, simply because result was not yet used in a way that require those computations to be (fully) materialized. This feature will not be much useful for end users, maybe except for some edge cases where the moment when particular computation needs to happen is important. Example api:

dt.persist()

When that could be useful.

ans = dt[f.val > 0, :]
ans.persist()

ans = dt[:, :, join(dt2)]
ans.persist()

Using .persist() we can ensure that:

  • any next operation on ans, whatever it will be, will not be penalized by commands that we made till now (by shifting the moment of computation to happens from the past to the current moment).
  • our machine has enough memory to compute what we asked it for till now.

Due to lack of the API I am requesting here, python datatable is currently suffering an extra overhead on db-benchmark join task, where ans.copy(deep=True) is used as a workaround. That AFAIU will not only force computation but allocate duplicated object of ans, unnecessarily of course. Looking forward for requested API so we can reduce unnecessary overhead.

jangorecki avatar May 15 '20 16:05 jangorecki

any idea if this FR is addressed by materialize?

jangorecki avatar Dec 15 '20 10:12 jangorecki