pgrx icon indicating copy to clipboard operation
pgrx copied to clipboard

Converting `NULL` to rust-defined type causes server segfault

Open willmurnane opened this issue 2 years ago • 5 comments

First, the test case:

git checkout develop
cargo install --path cargo-pgx/
cd pgx-examples/custom_types
cargo pgx run
custom_types=# select null::animals;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> 

After attaching a debugger, I observed that this function is being called with its Datum argument null, but is_null set to false. This causes a null pointer dereference when the CStr object is initialized, and an attempt is made to calculate the length. See discord discussion for more context.

I think this may be due to a misunderstanding somewhere of what the semantics of NullableDatum.isnull are. There's at least one place in the PostgreSQL source code that sets value NULL but isnull false. Thus, it may be a logic error specific to the impl FromDatum for CStr types.

One possible fix for this is to do the null check ourselves, in get_nullable_datum:

--- a/pgx/src/fcinfo.rs
+++ b/pgx/src/fcinfo.rs
@@ -158,7 +158,9 @@ mod pg_12_13_14 {
         unsafe {
             let nargs = fcinfo.nargs;
             let len = std::mem::size_of::<pg_sys::NullableDatum>() * nargs as usize;
-            fcinfo.args.as_slice(len)[num].clone()
+            let mut result = fcinfo.args.as_slice(len)[num].clone();
+            result.isnull |= result.value.is_null();
+            result
         }
     }

Another is to check in the associated FromDatum implementation for pointer-y types, like this:

@@ -292,7 +292,7 @@ impl<'a> FromDatum for &'a crate::cstr_core::CStr {
        datum: pg_sys::Datum,
        is_null: bool,
    ) -> Option<&'a crate::cstr_core::CStr> {
-        if is_null {
+        if is_null || datum.is_null() {
            None
        } else {
            Some(crate::cstr_core::CStr::from_ptr(datum.ptr_cast()))

Other solutions may be possible and/or better, but these are the ones I can think of off the top of my head.

willmurnane avatar Aug 23 '22 20:08 willmurnane

Upon further investigation, the cause of this is in fmgr.c, here. This initializes isnull to false, whether str is NULL or not. I could file a bug with upstream, or we can do our own checks for nullity.

It appears this behavior was introduced in 2019, in this commit:

@@ -1708,35 +1530,35 @@ OidFunctionCall9Coll(Oid functionId, Oid collation, Datum arg1, Datum arg2,
Datum
InputFunctionCall(FmgrInfo *flinfo, char *str, Oid typioparam, int32 typmod)
{
-	FunctionCallInfoData fcinfo;
+	LOCAL_FCINFO(fcinfo, 3);
	Datum		result;

	if (str == NULL && flinfo->fn_strict)
		return (Datum) 0;		/* just return null result */

-	InitFunctionCallInfoData(fcinfo, flinfo, 3, InvalidOid, NULL, NULL);
+	InitFunctionCallInfoData(*fcinfo, flinfo, 3, InvalidOid, NULL, NULL);

-	fcinfo.arg[0] = CStringGetDatum(str);
-	fcinfo.arg[1] = ObjectIdGetDatum(typioparam);
-	fcinfo.arg[2] = Int32GetDatum(typmod);
-	fcinfo.argnull[0] = (str == NULL);
-	fcinfo.argnull[1] = false;
-	fcinfo.argnull[2] = false;
+	fcinfo->args[0].value = CStringGetDatum(str);
+	fcinfo->args[0].isnull = false;
+	fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+	fcinfo->args[1].isnull = false;
+	fcinfo->args[2].value = Int32GetDatum(typmod);
+	fcinfo->args[2].isnull = false;
...

The older code set str == NULL as the isnull value, the new code unconditionally sets false.

willmurnane avatar Aug 24 '22 12:08 willmurnane

Based on this comment in fmgr.c:

"str" may be NULL to indicate we are reading a NULL. In this case the caller should assume the result is NULL, but we'll call the input function anyway if it's not strict. So this is almost but not quite the same as FunctionCall3.

it appears that Postgres thinks you Should Not make NULL convert to a meaningful value in your custom type; SELECT NULL::custom_type should not return anything but NULL. This makes me think that #641 is the recommended approach, rather than #646.

willmurnane avatar Aug 24 '22 13:08 willmurnane

Yes, it seems that these functions should almost always be strict, but there may be an "almost" there that complicates things... https://github.com/postgres/postgres/blob/e890ce7a4feb9d72cd502d52c5a4c4d853801974/src/backend/utils/fmgr/fmgr.c#L1534-L1545

Consulting the documentation for CREATE TYPE:

The input function must return a value of the data type itself. Usually, an input function should be declared STRICT; if it is not, it will be called with a NULL first parameter when reading a NULL input value. The function must still return NULL in this case, unless it raises an error. (This case is mainly meant to support domain input functions, which might need to reject NULL inputs.) The output function must be declared as taking one argument of the new data type. The output function must return type cstring. Output functions are not invoked for NULL values.

So if the input arg is nullptr, the function should return nullptr or error, apparently?

workingjubilee avatar Aug 26 '22 22:08 workingjubilee

I could file a bug with upstream, or we can do our own checks for nullity.

Both should happen, so if you could, please. It seems very unlikely this was what was intended, but also we should unfortunately program defensively against Postgres giving us bad data here except in places where we have total control.

workingjubilee avatar Aug 28 '22 23:08 workingjubilee

This is great feedback. I modified the *InOutFuncs traits to have another member:

    /// If PostgreSQL calls the conversion function with NULL as an argument, what
    /// error message should be generated?
    fn null_error_message() -> Option<&'static str> {
        None
    }

This function is called whenever NULL is supplied as an argument, and if it returns None then the result is also NULL. If Some string is returned, then that is provided to the error! macro to generate the appropriate error message.

This enforces the contract that the PostgreSQL documentation describes: if NULL is provided, the only two permissible outcomes are that 1) the function returns NULL, or 2) an error message is generated. This is simpler and conforms better to the documentation than the earlier attempt I had made, and #646 is now updated to include this approach.

willmurnane avatar Aug 29 '22 13:08 willmurnane