rbc icon indicating copy to clipboard operation
rbc copied to clipboard

UDTF crash server if `return != len(input)`

Open tupui opened this issue 3 years ago • 3 comments

The server crashes considering the following example, with input vectors of more than 2 and returning 2:

@omnisci('int32(Column<int32>, OutputColumn<int32>)')
def example(input, out):
    size = len(input)
    for i in range(size):
        out[i] = input[i]
    return 2

tupui avatar Jan 27 '22 20:01 tupui

The server likely crashes because no memory has been allocated to the output parameter out. Use:

@omnisci('int32(Column<int32>, OutputColumn<int32>)')
def example(input, out):
    size = len(input)
    set_output_row_size(size)
    for i in range(size):
        out[i] = input[i]
    return size

On the other hand, avoiding the server crash on the issue example may require analyzing the generated code and the corresponding signature. For instance, when the input specifies no sizer arguments and the body makes no call to set_output_row_size function then the resulting operator will likely crash the server. Another approach would be to implement a range check on indexing input and output columns so that running the above example on the server would result in an index error but it would keep the server alive.

pearu avatar Jan 27 '22 21:01 pearu

But then how to only return a slice? I thought this return could be used for that?

For crashing the server, would there be a way to use some sort of sandbox or pre-validation? It would be good to check the function when it's being registered so that a user cannot crash the server.

tupui avatar Jan 28 '22 07:01 tupui

But then how to only return a slice? I thought this return could be used for that?

There are (perhaps too many) number of ways to specify the size of output columns and each has its advantages/disadvantages. I'll give a summary elsewhere.

For crashing the server, would there be a way to use some sort of sandbox or pre-validation? It would be good to check the function when it's being registered so that a user cannot crash the server.

Sure, it would be desired but technically it is not trivial. For instance, a pre-validation requires generating sample inputs to table functions which means if a table function defines restrictions on arguments, the samples must obey these as well. And even then, one can likely construct a function that can be made to crash the server on specific inputs while on samples the function execute work well.

pearu avatar Jan 28 '22 10:01 pearu