monero icon indicating copy to clipboard operation
monero copied to clipboard

[Proposal] Deprecate RPC binary strings

Open hinto-janai opened this issue 6 months ago • 7 comments

What

This is a proposal to deprecate binary string usage in monerod's daemon RPC interface.

Binary strings are JSON strings that appear in monerod's daemon RPC output that contain raw binary data.

There are 2 JSON-RPC methods whose responses contain these binary strings:

Example

For example, the output of get_txpool_backlog:

curl \
    http://127.0.0.1:18081/json_rpc \
    -d '{"jsonrpc":"2.0","id":"0","method":"get_txpool_backlog"}' \
    -H 'Content-Type: application/json' \
    --output -

is:

{
  "id": "0",
  "jsonrpc": "2.0",
  "result": {
    "backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������",
    "credits": 0,
    "status": "OK",
    "top_hash": "",
    "untrusted": false
  }
}

The backlog field is a JSON string containing raw binary data.

Problem

The main problem with these binary strings is that when represented as a string, they do not always contain valid Unicode or escape characters. This is against the JSON specification.

Any sane JSON parser will reject this mixed format, for example:

jq
# jq: parse error: Invalid escape at line 1, column 418

echo '{"backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������"}' | jq
Python
# Traceback (most recent call last):
#   File "/tmp/a/a.py", line 4, in <module>
#     json.loads(backlog)
#   File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
#     return _default_decoder.decode(s)
#            ^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
#     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
#                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/usr/lib/python3.12/json/decoder.py", line 353, in raw_decode
#     obj, end = self.scan_once(s, idx)
#                ^^^^^^^^^^^^^^^^^^^^^^
# json.decoder.JSONDecodeError: Invalid control character at: line 1 column 15 (char 14)

import json

backlog = '{"backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������"}'
json.loads(backlog)
C++
/// main: rapidjson/document.h:1359:
///     rapidjson::GenericValue<Encoding, Allocator>::MemberIterator
/// 	rapidjson::GenericValue<Encoding, Allocator>::FindMember(const rapidjson::GenericValue<Encoding, SourceAllocator>&)
/// 	[
/// 		with
/// 		    SourceAllocator = rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>;
/// 			Encoding = rapidjson::UTF8<>;
/// 			Allocator = rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>;
/// 			MemberIterator = rapidjson::GenericMemberIterator<
/// 			    false,
/// 				rapidjson::UTF8<>,
/// 				rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>
/// 			>
/// 	]: Assertion `IsObject()' failed.
///
/// Aborted (core dumped)

#include "rapidjson/document.h"

using namespace rapidjson;

int main() {
    const char* json = "{\"backlog\": \"�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\
r@S\t;\n��������mK\n�������N6\n������\"}";

    Document document;
    document.Parse(json);
    document["backlog"].GetString();
}
Rust
/// thread 'main' panicked at src/main.rs:11:46:
/// called `Result::unwrap()` on an `Err` value: Error("invalid escape", line: 2, column: 54)
/// note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

#[derive(serde::Deserialize, serde::Serialize)]
struct Json {
    backlog: String,
}

const JSON: &str = r#"{"backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������"}"#;

fn main() {
    serde_json::from_str::<Json>(JSON).unwrap();
}

Mixing binary and JSON in this manner means that users of these RPC calls must essentially implement a custom JSON parser themselves that can parse this format (including Monero itself).

Replacements

I propose that Monero replaces these binary strings with one of the following options:

  1. Full JSON
  2. Full binary
  3. Hexadecimal encoded binary within JSON

Full JSON

RPC output that have binary strings can be replaced with regular JSON output, for example:

{
  "backlog": [
    {
      "blob_size": 0,
      "fee": 0,
      "time_in_pool": 0
    }
  ]
}

Full JSON could be viable depending on how often these RPC call are used. It may not be if these methods are called frequently.

Full binary

The current JSON-RPC methods can be deprecated in favor of .bin endpoints (e.g. /get_txpool_backlog.bin) that expect full binary rather than the mixed format.

It is worth noting that an un-documented binary version of get_output_distribution is already used by wallet2, in favor of the mixed version.

Hexadecimal encoded binary within JSON

Similar to other methods/endpoints, Monero could encode this binary information in hexadecimal before serialization instead of embedding raw binary, for example:

{
  "backlog": "efbfbd...defbfbd"
}

Steps

If this proposal moves forward, the steps towards completion would be:

  1. Modify monerod's RPC (de)serialization
  2. Update any call-sites (e.g. wallet2)
  3. Update monero-site documentation

hinto-janai avatar Aug 05 '24 00:08 hinto-janai