fix(rpc): Improve input validation and error handling
Fixes #13067
The rpc-server was vulnerable to Denial of Service attacks via several RPC commands (SET_TENSOR, GRAPH_COMPUTE, etc.). Malformed messages could trigger failed assertions (e.g., invalid ggml_type) or out-of-bounds reads/writes leading to GGML_ABORT calls, crashing the server process.
This PR introduces robust input validation and replaces abort() calls with graceful error handling:
- Type Validation:
deserialize_tensornow checks if thetensor->typeis within the validGGML_TYPE_COUNTrange before callingggml_new_tensor_4d. Returnsnullptron invalid type. - Bounds Checks: Replaced
GGML_ABORTinset_tensor,set_tensor_hash, andget_tensorhandlers with error logging and returningfalsewhen data/offset parameters are out of buffer bounds. - Error Propagation:
create_nodenow checks fornullptrreturn values fromdeserialize_tensorand its recursive calls, propagatingnullptrupwards on failure. Usesfindinstead ofatfor safer map access.copy_tensornow checks fornullptrfromdeserialize_tensorand sets the response status to failure if deserialization or bounds checks fail.graph_computenow checks fornullptrreturn fromcreate_nodeand returns failure status correctly. The final return value now reflects the actual computation status.RPC_CMD_GET_ALLOC_SIZEnow checks the return value ofserver.get_alloc_sizein the RPC server loop. If the call fails, return early to close the connection.
on my opinion it may affects to perfomance, may be use feature flag (via cmake)?
on my opinion it may affects to perfomance, may be use feature flag (via cmake)?
I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.
Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server. From invalid tensor operations to crashing via deep recursion in create_node which I would like to also fix. I'd like to work on those one change at a time though.
I think multiple critical fixes behind a feature flag would be counterintuitive. Rather build bench tooling (if needed) and iterate on the fixes so there's minimal performance hit.
I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.
We use llama-bench to test performance
Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server.
The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing. This way we can automatically test for security issues when we make RPC changes and even integrate it into the CI.
RPC_CMD_GET_ALLOC_SIZE does not check for errors, and if the call to get_alloc_size fails it will leave the client connected in a bad state:
https://github.com/ggml-org/llama.cpp/blob/604f0a0045ecb1988a32dbdad7ba4984df9c7ecd/ggml/src/ggml-rpc/ggml-rpc.cpp#L1470
Thanks @slaren, added it to this same PR since it falls under the same scope. https://github.com/ggml-org/llama.cpp/pull/13069/commits/e6dd976acf26cc929431bbd93651702fa7e7956c
The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing.
Sounds good, I can look into that after this PR 👍