ibis
ibis copied to clipboard
perf(dask): avoid triggering compute for dynamic limit/offset
You should be able to avoid the extra compute here by doing a
map_partitionscall of a function that applies thebetweenoperation below. This will avoid excess compute if the values ofn/offsetare dependent on anything indf.Something like (untested):
# define this at the top-level for cheaper pickling
def limit_chunk(df, col, n, offset):
n = n.iat[0, 0]
offset = offset.iat[0, 0]
if n is None:
return df[df[col] >= offset]
else:
return df[df[col].between(offset, offset + n - 1)]
# then in the visit function, something like:
return df.map_partitions(limit, col=name, n=n, offset=offset, align_partitions=False, meta=df._meta).drop(columns=[name])
Originally posted by @jcrist in https://github.com/ibis-project/ibis/pull/8005#discussion_r1473081193