cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

sql: cluster setting for early/late binding in udf body

Open chengxiong-ruan opened this issue 2 years ago • 7 comments

This commit adds a cluster setting to allow users to turn off early binding of referenced objects in function body at definition time.

Release note: None

chengxiong-ruan avatar Aug 09 '22 20:08 chengxiong-ruan

This change is Reviewable

cockroach-teamcity avatar Aug 09 '22 20:08 cockroach-teamcity

Thank you for doing this!

michae2 avatar Aug 09 '22 21:08 michae2

Is it ok/desirable that this is a cluster setting (aka, not permitting decisionmaking on a per session basis)?

jordanlewis avatar Aug 09 '22 21:08 jordanlewis

Is it ok/desirable that this is a cluster setting (aka, not permitting decisionmaking on a per session basis)?

I'll go one further: this needs to be a session variable.

Release note: None

This will ultimately need a release note, this is going to be something we want to make sure gets discussed in the docs.

ajwerner avatar Aug 09 '22 21:08 ajwerner

pkg/sql/vars.go line 2138 at r2 (raw file):

			return formatBoolAsPostgresSetting(evalCtx.SessionData().UserDefinedFunctionEarlyBinding), nil
		},
		GlobalDefault: globalTrue,

do we want a cluster setting as well so user can change the global default? ( I prefer not to heh..)

chengxiong-ruan avatar Aug 09 '22 22:08 chengxiong-ruan

I don't think we need such a setting now that we have database level session defaults

ajwerner avatar Aug 09 '22 23:08 ajwerner

It feels to me like we should indicate somewhere on the descriptor whether or not it's late bound. I have a feeling we'll want to treat the references differently. This feels like one of those changes which is going to leave us needing to think hard about things in the future. I can see why we want to do it now, and if folks do reach for it, it will be validating, but it leaves me a little bit worried.

Yeah, adding an a flag in the descriptor is a good idea. Thanks for the input. We'd also need to add validation for the descriptor as well to make sure references are correct if it's late bound (nothing is tracked). I'll address later after other priorities. For the leave us needing to think hard about things in the future part, that worth a conversation at some time.

(I don't think this is a stability period blocker. Will come back to this after other priorities being addressed)

chengxiong-ruan avatar Aug 10 '22 17:08 chengxiong-ruan

pkg/sql/opt/optbuilder/create_function.go line 49 at r3 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Yes, pg tracks user-defined types in UDF signature (argument types, return type) with whatever function body.

an example:

postgres=# CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TYPE

postgres=# create function f(mood) returns int language sql as $$ select 1 $$;
CREATE FUNCTION

postgres=# drop type mood;
ERROR:  cannot drop type mood because other objects depend on it
DETAIL:  function f(mood) depends on type mood
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

chengxiong-ruan avatar Aug 11 '22 14:08 chengxiong-ruan

@michae2 thanks for the review!

We still hesitate to do it. The reason is that turning this flag on and off and create functions with mixed behaviors might cause a lot of trouble to us in the future. More specifically, we'd have to think hard on reference tracking for two different paths, as well as validation, and not to mention the back references as well. There could other unknown problems hard to reason about without real use case (maybe index corruption and constraint violation we chatted about earlier). It'd be very error prone. It might also confuse users as they forgot what flags they set when creating a UDF as well. Adding this flag (and metadata tracking it) is likely trivial as it showed, but I think we need a good reason to compromise. And I also doubt migrating clients wants to import broken UDFs.

I'll close this PR for now. We can always add this later as more users ramp up with UDFs, at that moment hope we can make easier decision with real use cases. For 22.2, given the limitation of UDF, I doubt this flag would show much value either. One comprise we thought about is to add this flag but not document it anywhere, and only tell client when they need it. But I still want to be cautious with the reason above. Thanks again for bringing up this topic, it's super valuable to discuss on this and think through it.

chengxiong-ruan avatar Sep 30 '22 15:09 chengxiong-ruan

@chengxiong-ruan makes sense. If it's alright with you, I'm going to open an issue about late vs early binding, with some of the examples from the RFC discussion, so that we have something to point people to if they run into this.

michae2 avatar Sep 30 '22 16:09 michae2

@chengxiong-ruan makes sense. If it's alright with you, I'm going to open an issue about late vs early binding, with some of the examples from the RFC discussion, so that we have something to point people to if they run into this.

@michae2 yes, please. That'd be super helpful. Thanks!

chengxiong-ruan avatar Sep 30 '22 16:09 chengxiong-ruan

Ok, filed #89109.

michae2 avatar Sep 30 '22 18:09 michae2