datatable
datatable copied to clipboard
force lazy evaluation
This is follow up of our slack conversation. FR is about providing API that allows to force materialize results of computations which might have not been materialized yet, simply because result was not yet used in a way that require those computations to be (fully) materialized. This feature will not be much useful for end users, maybe except for some edge cases where the moment when particular computation needs to happen is important. Example api:
dt.persist()
When that could be useful.
ans = dt[f.val > 0, :]
ans.persist()
ans = dt[:, :, join(dt2)]
ans.persist()
Using .persist() we can ensure that:
- any next operation on
ans, whatever it will be, will not be penalized by commands that we made till now (by shifting the moment of computation to happens from the past to the current moment). - our machine has enough memory to compute what we asked it for till now.
Due to lack of the API I am requesting here, python datatable is currently suffering an extra overhead on db-benchmark join task, where ans.copy(deep=True) is used as a workaround. That AFAIU will not only force computation but allocate duplicated object of ans, unnecessarily of course.
Looking forward for requested API so we can reduce unnecessary overhead.
any idea if this FR is addressed by materialize?