llama.cpp fix(rpc): Improve input validation and error handling

Fixes #13067

The rpc-server was vulnerable to Denial of Service attacks via several RPC commands (SET_TENSOR, GRAPH_COMPUTE, etc.). Malformed messages could trigger failed assertions (e.g., invalid ggml_type) or out-of-bounds reads/writes leading to GGML_ABORT calls, crashing the server process.

This PR introduces robust input validation and replaces abort() calls with graceful error handling:

Type Validation: deserialize_tensor now checks if the tensor->type is within the valid GGML_TYPE_COUNT range before calling ggml_new_tensor_4d. Returns nullptr on invalid type.
Bounds Checks: Replaced GGML_ABORT in set_tensor, set_tensor_hash, and get_tensor handlers with error logging and returning false when data/offset parameters are out of buffer bounds.
Error Propagation:
- create_node now checks for nullptr return values from deserialize_tensor and its recursive calls, propagating nullptr upwards on failure. Uses find instead of at for safer map access.
- copy_tensor now checks for nullptr from deserialize_tensor and sets the response status to failure if deserialization or bounds checks fail.
- graph_compute now checks for nullptr return from create_node and returns failure status correctly. The final return value now reflects the actual computation status.
- RPC_CMD_GET_ALLOC_SIZE now checks the return value of server.get_alloc_size in the RPC server loop. If the call fails, return early to close the connection.

Apr 22 '25 14:04 thevilledev

on my opinion it may affects to perfomance, may be use feature flag (via cmake)?

Apr 22 '25 21:04 lexasub

on my opinion it may affects to perfomance, may be use feature flag (via cmake)?

I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.

Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server. From invalid tensor operations to crashing via deep recursion in create_node which I would like to also fix. I'd like to work on those one change at a time though.

I think multiple critical fixes behind a feature flag would be counterintuitive. Rather build bench tooling (if needed) and iterate on the fixes so there's minimal performance hit.

Apr 23 '25 18:04 thevilledev

I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.

We use llama-bench to test performance

Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server.

The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing. This way we can automatically test for security issues when we make RPC changes and even integrate it into the CI.

Apr 24 '25 08:04 rgerganov

RPC_CMD_GET_ALLOC_SIZE does not check for errors, and if the call to get_alloc_size fails it will leave the client connected in a bad state: https://github.com/ggml-org/llama.cpp/blob/604f0a0045ecb1988a32dbdad7ba4984df9c7ecd/ggml/src/ggml-rpc/ggml-rpc.cpp#L1470

Apr 24 '25 10:04 slaren

Thanks @slaren, added it to this same PR since it falls under the same scope. https://github.com/ggml-org/llama.cpp/pull/13069/commits/e6dd976acf26cc929431bbd93651702fa7e7956c

Apr 24 '25 17:04 thevilledev

The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing.

Sounds good, I can look into that after this PR 👍

Apr 24 '25 17:04 thevilledev