risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

Support `random()` in batch queries

Open fuyufjh opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

Handy for generating testing data.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

fuyufjh avatar May 10 '24 09:05 fuyufjh

The corresponding setseed function seems non-trivial. Note the following query is actually deterministic:

test=# select case n when 1 then setseed(0.4) else null end, random() from generate_series(1, 10) as t(n);
 case |       random        
------+---------------------
      |  0.6722690410337004
      |  0.9936586191615264
      | 0.19947461195735405
      |    0.90340354600701
      |   0.516998156895498
      |  0.7913052710048887
      |   0.531550972556847
      | 0.48302401286537666
      | 0.14525565255427209
      | 0.19221106009513123
(10 rows)

So what random() really does in PostgreSQL is mutating a session-level variable: https://www.postgresql.org/docs/16/sql-set.html#:~:text=TO%20value.-,SEED,-Sets%20the%20internal https://github.com/postgres/postgres/blob/REL_16_1/src/backend/utils/misc/guc_tables.c#L3695-L3704 https://github.com/postgres/postgres/blob/REL_16_1/src/common/pg_prng.c#L232-L234

Of course we can provide a weaker random() function first without the ability to generate deterministic sequence given a seed.

cc @wangrunji0408 may be more familiar with RNGs given experience working on madsim.

xiangjinwu avatar May 13 '24 05:05 xiangjinwu

So what random() really does in PostgreSQL is mutating a session-level variable:

Of course we can provide a weaker random() function first without the ability to generate deterministic sequence given a seed.

Either LGTM. If the session-level variable approach involves lots of work, I will prefer the 2nd

fuyufjh avatar May 13 '24 07:05 fuyufjh