llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

fix(rpc): Improve input validation and error handling

Open thevilledev opened this issue 8 months ago • 6 comments

Fixes #13067

The rpc-server was vulnerable to Denial of Service attacks via several RPC commands (SET_TENSOR, GRAPH_COMPUTE, etc.). Malformed messages could trigger failed assertions (e.g., invalid ggml_type) or out-of-bounds reads/writes leading to GGML_ABORT calls, crashing the server process.

This PR introduces robust input validation and replaces abort() calls with graceful error handling:

  • Type Validation: deserialize_tensor now checks if the tensor->type is within the valid GGML_TYPE_COUNT range before calling ggml_new_tensor_4d. Returns nullptr on invalid type.
  • Bounds Checks: Replaced GGML_ABORT in set_tensor, set_tensor_hash, and get_tensor handlers with error logging and returning false when data/offset parameters are out of buffer bounds.
  • Error Propagation:
    • create_node now checks for nullptr return values from deserialize_tensor and its recursive calls, propagating nullptr upwards on failure. Uses find instead of at for safer map access.
    • copy_tensor now checks for nullptr from deserialize_tensor and sets the response status to failure if deserialization or bounds checks fail.
    • graph_compute now checks for nullptr return from create_node and returns failure status correctly. The final return value now reflects the actual computation status.
    • RPC_CMD_GET_ALLOC_SIZE now checks the return value of server.get_alloc_size in the RPC server loop. If the call fails, return early to close the connection.

thevilledev avatar Apr 22 '25 14:04 thevilledev

on my opinion it may affects to perfomance, may be use feature flag (via cmake)?

lexasub avatar Apr 22 '25 21:04 lexasub

on my opinion it may affects to perfomance, may be use feature flag (via cmake)?

I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.

Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server. From invalid tensor operations to crashing via deep recursion in create_node which I would like to also fix. I'd like to work on those one change at a time though.

I think multiple critical fixes behind a feature flag would be counterintuitive. Rather build bench tooling (if needed) and iterate on the fixes so there's minimal performance hit.

thevilledev avatar Apr 23 '25 18:04 thevilledev

I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.

We use llama-bench to test performance

Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server.

The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing. This way we can automatically test for security issues when we make RPC changes and even integrate it into the CI.

rgerganov avatar Apr 24 '25 08:04 rgerganov

RPC_CMD_GET_ALLOC_SIZE does not check for errors, and if the call to get_alloc_size fails it will leave the client connected in a bad state: https://github.com/ggml-org/llama.cpp/blob/604f0a0045ecb1988a32dbdad7ba4984df9c7ecd/ggml/src/ggml-rpc/ggml-rpc.cpp#L1470

slaren avatar Apr 24 '25 10:04 slaren

Thanks @slaren, added it to this same PR since it falls under the same scope. https://github.com/ggml-org/llama.cpp/pull/13069/commits/e6dd976acf26cc929431bbd93651702fa7e7956c

thevilledev avatar Apr 24 '25 17:04 thevilledev

The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing.

Sounds good, I can look into that after this PR 👍

thevilledev avatar Apr 24 '25 17:04 thevilledev