ibis
ibis copied to clipboard
[HELP NEEDED] How to insert `RAND(seed)` in `ORDER BY` clause for impala backend?
Problem
I need deterministic random sampling within partitions using a seed value. I want to generate SQL like ORDER BY RAND(42) but cannot find a way to pass a seed parameter to ibis random functions.
Current Code
# This works for random sampling (no seed)
order_by = [ibis.random()]
window_spec = ibis.window(group_by=['day_partition', 'hour_partition'], order_by=order_by)
Desired SQL Output
ROW_NUMBER() OVER (PARTITION BY day_partition, hour_partition ORDER BY RAND(42))
Attempted Solutions
- ibis.random() with seed - No seed parameter supported
- ops.RandomScalar(seed=42) - TypeError: got an unexpected keyword argument 'seed'
- ibis.sql(f"RAND({seed})") - AttributeError: module 'ibis' has no attribute 'sql'
- ibis.literal(f"RAND({seed})") - Produces string literal 'RAND(42)' instead of function call
- ibis.param("int64") - Results in ORDER BY 42 (constant value, no randomness)
Thanks in advance.
There are a number of issues with making an API for this. See here for a discussion.
In the meantime, for Impala, you should be able to do this:
In [38]: con = ibis.impala.connect()
In [39]: t = con.create_table("t", obj=pd.DataFrame({"a": [1, 2, 3]}))
In [40]: t
Out[40]:
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 3 │
│ 2 │
│ 1 │
└───────┘
In [41]: randomized = t.sql("select * from t order by rand(42)")
In [42]: randomized
Out[42]:
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
│ 1 │
└───────┘
Thanks for the explanation! I applied the method you suggested : )