research-lab icon indicating copy to clipboard operation
research-lab copied to clipboard

Ring Binning

Open UkoeHB opened this issue 3 years ago • 94 comments

note: tx chaining was removed after discussion (July 18, 2020)

Table Of Contents

  • Abstract
  • Motivation
  • Algorithm summary
  • Binning strategy
    • Bin configuration details
    • Bins
    • Binning algorithm: tx construction
      • Constants and inputs
      • Ring seed
      • Binning upper bound
      • Selecting bins
      • Selecting bin members
      • Full ring member set
    • Binning algorithm: tx validation
    • Modifications for small ring sizes (WIP)
  • Tx changes
    • Tx validation
    • Blockchain and tx hashes
  • Rationale
  • Backward compatibility
  • License
  • References

Abstract

Reference on-chain outputs for ring membership deterministically to minimize storage, and introduce a binning strategy to mitigate failures of the decoy selection distribution (e.g. gamma distribution).

Motivation

Monero is planning to implement a next-gen protocol with sublinear scaling that should allow ring sizes on the order of 2^6 to 2^8 (64 to 256) (either Triptych or some alternative). Referencing each ring member individually is inefficient, so 'deterministic' references are beneficial. 'Deterministic' means being able to recover an arbitrary number of on-chain indices from a small tuple of variables that include some kind of entropy.

As discussed in An Empirical Analysis of Traceability in the Monero Blockchain, selecting ring decoys directly from a distribution that spans the entire ledger is not perfect. If an observer has special timing knowledge about a given transaction, and/or knows that the selection distribution does not match the true spend distribution, then they can gain a significant advantage when trying to guess the true spends in that transaction.

That problem can be mitigated by selecting 'clumps' of decoys from the ledger-spanning selection distribution. A 'clump' (or 'bin') of decoys is a small set of decoys that are located very close to each other on the ledger. This way, even if an observer can guess the age of a transaction's true spend to within a few hours or less, the true spend will still be hidden among some decoys.

It isn't ideal to only select decoys that are close to the real spend. Even if some observers have special timing information about transactions, other observers may not. It is therefore useful to combine clumping/binning with selection from the ledger-wide distribution (recommended by Foundations of Ring Sampling).

This proposal describes a deterministic ring member referencing strategy where clumps/bins of decoys are selected from a ledger-wide distribution. Privacy considerations are discussed where appropriate.

Algorithm summary

To set the stage, I will briefly summarize the ring member referencing algorithm here (glossing over all the details, which can be found in the actual algorithm below). Each output spent by a transaction will have its own ring member reference tuple (set of variables to recover its set of ring members from).

  1. Define binning_upper_bound, which is the index of the highest output in the ledger that can be referenced by this input. Defining this is necessary so interacting with the ledger-wide selection distribution is deterministic.
  2. Choose a random output from a fixed range around the true spend [true_spend_index - bin_radius, true_spend_index + bin_radius], which is equal to the width that bins will have.
  3. Map that random output from the ledger-wide selection distribution (defined based on binning_upper_bound) to a uniform distribution with a CDF. Its value in the uniform distribution is denoted mapped_real_spend_bin_center.
  4. Use a hash function and public entropy to generate n points in the uniform distribution.
  5. Select a point n' at random from those n points. Define bin_rotator = mapped_real_spend_bin_center - n' mod uniform_distribution.size().
  6. Redefine all n points: n += bin_rotator mod uniform_distribution.size(). Note that n' will now equal mapped_real_spend_bin_center.
  7. Map all n points from the uniform distribution back into the ledger-wide selection distribution with a reverse CDF. All points mapped_n are 'bin centers', the centers of each of the bins in the final ring member reference set. If bins should have only one member each (bin_radius = 0), then mapped_n can be treated directly as ring member references and you don't need to proceed further.
  8. Around each bin center mapped_n, use a hash function and public entropy to generate m bin members from the range [mapped_n - bin_radius, mapped_n + bin_radius].
  9. In the bin that contains the real spend, randomly select a bin member m'. Define bin_member_rotator = real_spend_index - m' mod 2*bin_radius + 1.
  10. Redefine all m points in all bins: m = [(m - [mapped_real_spend_bin_center - bin_radius]) + bin_member_rotator mod 2*bin_radius + 1] + [mapped_real_spend_bin_center - bin_radius]. Note that m' will now equal real_spend_index.
  11. Return: hash function, public entropy, binning_upper_bound, bin_rotator, and bin_member_rotator. In practice, only bin_rotator and bin_member_rotator need to be unique between transaction inputs.

Binning strategy

For a given tx, reference all its inputs' ring members with the following strategy inspired by How to Squeeze a Crowd.

Bin configuration details

Each transaction has:

  • binning_upper_bound (varint): An on-chain index that sets the upper bound for the range of outputs that can be members of bins. There is one of these per transaction.

Each input has:

  • bin_rotator (unsigned 8-byte integer): Rotates pseudo-randomly generated bin centers around the bin selection distribution.
  • bin_member_rotator (varint): Rotates pseudo-randomly generated bin members within all bins.

Instead of public entropy, we will use a pseudo-random seed computed from parts of the transaction.

Bins

Define the number of bins for each input.

  • NUM_BINS = floor(sqrt(RING_SIZE)). [[[TODO: is logarithmic bin division better? other ideas? discussion is ongoing, however there is general consensus that in practice NUM_BINS should be >= current ring size (11)]]]
    • RING_SIZE: A constant defined by the Monero protocol.

Given uneven_bins = RING_SIZE mod NUM_BINS, define the number of bin members in each bin.

  • After computing each bin's bin_center (discussed below), sort the bins. Let index(bin, bins) give the index of bin in vector bins.
if (index(bin, bins) + 1 > NUM_BINS - uneven_bins)
    num_bin_members = ceil(RING_SIZE/NUM_BINS);
else
    num_bin_members = floor(RING_SIZE/NUM_BINS);

Rationale/Discussion

  • We recommend sqrt() for deciding the number of bins to balance between selecting bins (ledger-scope) and bin membership (local-scope).
  • If all bins can't have the same number of members, then the remainder is divided among the 'upper' bins. Monero's gamma distribution favors higher indices, so giving higher-indexed bins more members conforms with that idea.

Binning algorithm: tx construction

Define the bin configuration details and obtain the list of ring members for each input.

Constants and inputs

struct binning_config //[[[TODO: define these; consider using constructor to validate config settings]]]
{
    size_t BINNING_UPPER_BOUND_SELECTION_WIDTH;
    size_t BIN_RADIUS;
    size_t RING_SIZE;
    size_t NUM_BINS;
    ledger_distribution_config LEDGER_DIST_CONFIG;

    size_t max_index;   // max on-chain output index when making tx, assumed to be in 'spendable' range
    size_t min_index;   // min on-chain output index that can be used as a ring member
};

/// input variables
binning_config config;  // configuration for the binning algorithm
vector<size_t> spend_output_indices;    // indices of outputs to be spent

assert(max_index >= min_index);
assert(spend_output_indices.size() > 0);

/// leave early if there are not enough outputs on-chain to construct a ring
assert(height_from_output_index(max_index) - height_from_output_index(min_index) >= BINNING_UPPER_BOUND_SELECTION_WIDTH);

size_t min_binning_upper_bound_block = height_from_output_index(max_index) - BINNING_UPPER_BOUND_SELECTION_WIDTH;
size_t min_binning_upper_bound = get_last_output_index_of_block(min_binning_upper_bound_block);

assert(min_binning_upper_bound >= min_index);
assert(min_binning_upper_bound - min_index + 1 >= (2 x BIN_RADIUS + 1));

Rationale/Discussion

  • To construct a ring, you need enough blocks to cover the maximum binning upper bound selection-range and enough outputs to span a full bin member selection-range (i.e. a single bin) below the minimum binning upper bound. In practice, if this were implemented, you'd have to wait until enough binning-eligible outputs have been added to the chain before a tx using binning can be constructed.
  • The output at max_index must be 'spendable', meaning a transaction that references that output won't be rejected if immediately submitted to the network. Monero has a 10-block DEFAULT_SPENDABLE_AGE that disallows adding a transaction to the chain if the tx references an output from the previous 10 blocks.
    • If a transaction author is concerned that DEFAULT_SPENDABLE_AGE is not wide enough to protect their transaction from reorgs, then they should set max_index to an output from a sufficiently low block.

Ring seed

Compute a pseudo-random number for generating all the bins and bin members for the transaction's inputs. We use a pseudo-random number in place of public entropy.

// ring entropy
u128 ring_seed = H("ring_seed", tx.key_images, tx.pseudo_output_commitments);

Rationale/Discussion

  • The ring seed is a hash of both key images and pseudo output commitments for uniqueness between transaction attempts.
    • A robust Monero transaction builder always has unique pseudo-output commitments, either because at least one of them has a randomly-generated blinding factor (if there are more than one), or (if there is only one) because one blinding factor is set equal to the sum of the tx's new outputs' commitments' blinding factors (which are generated as hashes of sender-receiver shared secrets, which should be based on a randomly generated unique-per-tx-attempt transaction private key). This indirect inheritance of entropy can be considered 'not ideal' from a security standpoint, but is an optimization that piggybacks on a more serious security problem. If pseudo-output commitments made by a tx-builder are not apparently random, this represents a serious flaw because badly generated pseudo-output commitments can reveal the amounts spent by a transaction.
    • Multiple transaction attempts will reveal the true spend due to ring member intersections. The only way to avoid this (in this proposal) would be to use input_seed = H("input_seed", input.key_image) instead of ring_seed (i.e. a unique seed per input, that is constant between tx attempts), and also use the same binning_upper_bound for each attempt.
  • The ring seed is deterministic from tx details so tx verifiers can reconstruct the ring members for each input.
    • An alternative would be randomly generating a 16-byte ring_entropy and recording it explicitly in transactions. We don't see any meaningful advantage to that approach.

Binning upper bound

Select the binning upper bound, which is the maximum index that can be referenced by a bin. The same upper bound will be used for all inputs.

size_t bound_selection_max;
size_t bound_selection_min;

/// block where highest spendable output can be found
size_t max_block_index = height_from_output_index(max_index);
bound_selection_max = max_block_index;

/// find highest real-spend
size_t max_spend_index{0};

for (const auto spend_output_index : spend_output_indices)
{
    assert(spend_output_index <= max_index);
    assert(spend_output_index >= min_index);

    if (spend_output_index > max_spend_index)
        max_spend_index = spend_output_index;
}

// block where highest real-spend can be found
size_t max_spend_block = height_from_output_index(max_spend_index);
assert(max_spend_block <= max_block_index);

/// binning upper bound must be >= all real-spends
bound_selection_min = max_block_index - BINNING_UPPER_BOUND_SELECTION_WIDTH;

if (bound_selection_min < max_spend_block)
    bound_selection_min = max_spend_block;

// rand_from_range<T>(): select integral of type T randomly from a range [a, b]
size_t binning_upper_bound_block = rand_from_range<size_t>(bound_selection_min, bound_selection_max);

/// binning upper bound is last output in the 'binning upper bound block'
size_t binning_upper_bound = get_last_output_index_of_block(binning_upper_bound_block);

/// set fee
u64 fee = get_fee_from_height(height_from_output_index(binning_upper_bound), priority);

return: binning_upper_bound, fee

Rationale/Discussion

  • All inputs use the same binning_upper_bound to reduce storage in transactions. Beyond that, if there were unique upper bounds per input, then observers could use the average binning upper bound of multi-input transactions to more accurately estimate the time those tx were constructed.
  • The BINNING_UPPER_BOUND_SELECTION_WIDTH is the region to randomly select a binning upper bound from. Selecting it from a range of blocks instead of using a fixed distance from max_index makes it harder for observers to analyze precisely when a transaction was constructed.
  • Fees are set off the binning_upper_bound instead of max_index. In other words, the algorithmic minimum fee for the block containing the output with index binning_upper_bound should be used to set the transaction's fee. This mitigates observers' ability to use the transaction fee to estimate when a transaction was constructed.

Selecting bins

Deterministically select the bins for input input_index.

/// set input seed
u128 input_seed = H("input_seed", ring_seed, input_index);

/// prepare bin centers
vector<u64> bin_centers_flattened;
assert(NUM_BINS);
bin_centers_flattened.resize(NUM_BINS);

for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
    // rand_from_seed<T>(seed): pseudo-randomly generate integral type T using input 'seed' to seed the generator
    // note: this must be uniformly distributed
    // - simple solution: return seed mod T:max
    //  - requires: either seed_type:max mod T:max == 0, OR seed_type:max >>> T:max
    //  - and must assume input seed is uniformly distributed
    bin_centers_flattened[bin_index] = rand_from_seed<u64>(H("bin_centers", input_seed, bin_index));


/// set bin_rotator

// 1. randomly select the real bin's center from around the output
size_t real_bin_max = spend_output_indices[input_index] + BIN_RADIUS;
size_t real_bin_min = spend_output_indices[input_index] - BIN_RADIUS;

// snap bin bounds to allowed range (with adjustments in case of integer overflow)
if (real_bin_max > binning_upper_bound || real_bin_max < spend_output_indices[input_index])
    real_bin_max = binning_upper_bound;

if (real_bin_min < min_index || real_bin_min > spend_output_indices[input_index])
    real_bin_min = min_index;

size_t real_bin_center = rand_from_range<size_t>(real_bin_min, real_bin_max);

// 2. randomly select a bin to be the 'real one'
size_t real_bin_index = rand_from_range<size_t>(0, bin_centers_flattened.size() - 1);

// 3.map the real bin center into the uniform distribution
// map_index_to_ledger_probability_dist(ledger-wide distribution config, 'index' normalized, 'max-index' of binning normalized)
// returns: (probability a randomly generated number in ledger-wide selection distribution over range [0, 'max-index'] is <= than 'index') x max(u64)
// [[[TODO: implementation (may need to take into account block:output relationship, e.g. density issue)]]]
u64 real_bin_center_flattened = map_index_to_ledger_probability_dist(LEDGER_DIST_CONFIG, real_bin_center - min_index, binning_upper_bound - min_index);

// 3. map the selected bin onto the real spend's bin center via 'bin_rotator'

// mod_subtract(a, b, c): a - b mod c
u64 bin_rotator = mod_subtract(real_bin_center_flattened, bin_centers_flattened[real_bin_index], max(u64));


/// rotate all the bins

// mod_add(a, b, c): a + b mod c
for (auto &bin_center : bin_centers_flattened)
    bin_center = mod_add(bin_center, bin_rotator, max(u64));


/// convert bin centers to real index space
vector<size_t> bin_centers;
bin_centers.resize(NUM_BINS);

for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
{
    // map_ledger_probability_dist_to_index(ledger-wide distribution config, 'probability', 'max-index' of binning normalized)
    // returns: index in range [0, 'max-index'] where random selection from the ledger-wide distribution on that range will have 'probability' probability of being <= the return value
    // [[[TODO: implementation]]]
    bin_centers[bin_index] = map_ledger_probability_dist_to_index(LEDGER_DIST_CONFIG, static_cast<double>(bin_centers_flattened[bin_index])/max(u64), binning_upper_bound - min_index) + min_index;

    // snap bin centers into the available range for bin members
    if (bin_centers[bin_index] > binning_upper_bound - BIN_RADIUS)
        bin_centers[bin_index] = binning_upper_bound - BIN_RADIUS;

    if (bin_centers[bin_index] < min_index + BIN_RADIUS)
        bin_centers[bin_index] = min_index + BIN_RADIUS;

    assert(bin_centers[bin_index] <= binning_upper_bound - BIN_RADIUS);
    assert(bin_centers[bin_index] >= min_index + BIN_RADIUS);
}


/// sort the bin centers and update real bin index accordingly

size_t real_bin_center = bin_centers[real_bin_index];
bin_centers.sort();

// if there are duplicates of the real bin's center, then randomly select which bin will contain the spent output
size_t min_candidate_index{NUM_BINS};
size_t max_candidate_index;

for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
{
    if (bin_centers[bin_index] == real_bin_center)
    {
        if (min_candidate_index == NUM_BINS)
            min_candidate_index = bin_index;

        max_candidate_index = bin_index;
    }
}

real_bin_index = rand_from_range<size_t>(min_candidate_index, max_candidate_index);

return: input_seed, bin_centers, bin_rotator, real_bin_index

Rationale/Discussion

  • The real spend's bin center is uniformly distributed with respect to the real spend, so if the ledger-wide selection distribution across any bin's width is effectively uniform, then bin centers cannot be used to infer which bin member is the true spend.
    • If bins are too wide, so the ledger-wide selection distributin (and/or true spend distribution) is not effecitvely uniform across the width of each bin, then observers may be able to use the relative position of bin members with respect to bin centers to gain an advantage when guessing which bin member is the true spend.
  • The method of generating bin centers in the uniform distribution, rotating them onto the real spend's bin center, then mapping the bin centers into the ledger-wide selection distribution, is conceptually equivalent to selecting, at random, one of the permutations of selections from the ledger-wide selection distribution that just so happens to include the real spend's bin center. Since the transaction author chooses which generated bin center to map onto his real spend at random, observers cannot use bin_rotator to deduce which bin has the true spend unless the ledger-wide selection distribution is flawed.
  • Note that section Bins requires bins to be sorted because the bins with the highest bin centers will have more ring members than lower bins if there is a remainder when dividing ring members among bins (in the next section).

Selecting bin members

Deterministically select the bin members for input input_index.

/// select bin members

vector<vector<size_t>> bin_members_per_bin;
bin_members_per_bin.resize(NUM_BINS);

for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
{
    // get_num_bin_members(which bin, number of bins, number of ring members) - defined by secion 'Bins'
    size_t num_bin_members = get_num_bin_members(bin_index, NUM_BINS, RING_SIZE);
    assert(num_bin_members);
    assert(num_bin_members <= 2 x BIN_RADIUS + 1);
    bin_members_per_bin[bin_index].resize(num_bin_members);

    size_t bin_min_index = bin_centers[bin_index] - BIN_RADIUS;

    // deterministically generate bin members in the range [bin_min_index, bin_max_index]
    for (size_t bin_member_index{0}; bin_member_index < num_bin_members; bin_member_index++)
    {
        bool was_duplicate;
        size_t duplicate_nonce{0};

        // prevent duplicates within each bin
        do
        {
            was_duplicate = false;

            // mod_large(a, c): a mod c, where a >> c
            bin_members_per_bin[bin_index][bin_member_index] = mod_large(rand_from_seed<u128>(H("bin_members", input_seed, bin_index, bin_member_index, duplicate_nonce)), 2 x BIN_RADIUS + 1) + bin_min_index;

            for (size_t i{0}; i < bin_member_index; i++)
            {
                if (bin_members_per_bin[bin_index][i] == bin_members_per_bin[bin_index][bin_member_index])
                {
                    was_duplicate = true;
                    ++duplicate_nonce;

                    break;
                }
            }
        } while (was_duplicate);
    }
}


/// in real bin, randomly select a bin member to be the real one
size_t real_bin_member_index = rand_from_range<size_t>(0, bin_members_per_bin[real_bin_index].size() - 1);


/// define the bin member rotator
size_t bin_member_rotator;
size_t real_bin_min_index = bin_centers[real_bin_index] - BIN_RADIUS;

bin_member_rotator = mod_subtract(spend_output_indices[input_index] - real_bin_min_index, bin_members_per_bin[real_bin_index][real_bin_member_index] - real_bin_min_index, 2 x BIN_RADIUS + 1);


/// rotate all bin members and sort each bin
for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
{
    size_t real_bin_min_index = bin_centers[bin_index] - BIN_RADIUS;

    for (auto &bin_member : bin_members_per_bin[bin_index])
        bin_member = mod_add(bin_member - real_bin_min_index, bin_member_rotator, 2 x BIN_RADIUS + 1) + real_bin_min_index;

    bin_members_per_bin[bin_index].sort();
}


/// get real bin member index post-sorting

// note: there should be no duplicate bin members
for (size_t bin_member_index{0}; bin_member_index < bin_members_per_bin[real_bin_index].size(); bin_member_index++)
{
    if (bin_members_per_bin[real_bin_index][bin_member_index] == spend_output_indices[input_index])
    {
        real_bin_member_index = bin_member_index;

        break;
    }
}

return: bin_members_per_bin, bin_member_rotator, real_bin_member_index

Rationale/Discussion

  • Within each bin, bin members are selected pseudo-randomly then they are all rotated so that one of the bin members in the true spend's bin lands on the true spend. Since all bins have the same width, and the within the true spend's bin the bin member to rotate on to the true spend is selected at random by the transaction author, an observer cannot use bin_member_rotator to infer anything about which bin contains the true spend, nor which bin member in any given bin is more/less likely to be the true spend.
    • As noted in the previous section, if the ledger-wide selection distribution is not effectively uniform across the width of each bin, then observers will have an advantage when trying to guess which bin members are more/less likely to be the true spend.
  • For simplicity, the algorithm above permits bins to have the same center, and for bin members to be duplicates between different inputs or between different bins (but not within the same bin). [[[TODO: better for performance not to test for duplicates? ok for performance to test for duplicates more rigorously?]]]

Full ring member set

/// concatenate the bins together to get the full ring-member set; also get the real-spend's index in the ring
vector<size_t> all_bin_members;
all_bin_members.reserve(RING_SIZE);
size_t real_bin_member_index_in_ring{0};
bool passed_real;

for (const auto &bin_members : bin_members_per_bin)
{
    if (bin_members == bin_members_per_bin[real_bin_index])
    {
        real_bin_member_index_in_ring += real_bin_member_index;

        passed_real = true;
    }
    else if (!passed_real)
        real_bin_member_index_in_ring += bin_members.size();

    for (const auto bin_member: bin_members)
        all_bin_members.emplace_back(bin_member);
}

return: all_bin_members, real_bin_member_index_in_ring

Binning algorithm: tx validation

Recover the ring members of each input in a transaction tx from the bin configuration details.

/// inputs
Tx tx;
binning_config config;


/// recover ring members
u128 ring_seed = H("ring_seed", tx.key_images, tx.pseudo_output_commitments);
vector<vector<size_t>> all_ring_members_per_input;]

assert(NUM_BINS);
assert(tx.binning_upper_bound >= min_index + 2 x BIN_RADIUS + 1);

for (size_t input_index{0}; input_index < tx.inputs.size(); input_index++)
{
    u128 input_seed = H("input_seed", ring_seed, input_index);

    /// generate bin centers and rotate them
    vector<u64> bin_centers_flattened;
    bin_centers_flattened.resize(NUM_BINS);

    for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
        bin_centers_flattened[bin_index] = rand_from_seed<u64>(H("bin_centers", input_seed, bin_index));

    for (auto &bin_center : bin_centers_flattened)
        bin_center = mod_add(bin_center, tx.inputs[input_index].bin_rotator, max(u64));

    /// convert bin centers to real index space
    vector<size_t> bin_centers;
    bin_centers.resize(NUM_BINS);

    for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
    {
        // [[[TODO: implementation]]]
        bin_centers[bin_index] = map_ledger_probability_dist_to_index(LEDGER_DIST_CONFIG, static_cast<double>(bin_centers_flattened[bin_index])/max(u64), tx.binning_upper_bound - min_index) + min_index;

        if (bin_centers[bin_index] > tx.binning_upper_bound - BIN_RADIUS)
            bin_centers[bin_index] = tx.binning_upper_bound - BIN_RADIUS;

        if (bin_centers[bin_index] < min_index + BIN_RADIUS)
            bin_centers[bin_index] = min_index + BIN_RADIUS;

        assert(bin_centers[bin_index] <= tx.binning_upper_bound - BIN_RADIUS);
        assert(bin_centers[bin_index] >= min_index + BIN_RADIUS);
    }

    bin_centers.sort();

    /// generate bin members
    vector<vector<size_t>> bin_members_per_bin;
    bin_members_per_bin.resize(NUM_BINS);

    for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
    {
        size_t num_bin_members = get_num_bin_members(bin_index, NUM_BINS, RING_SIZE);
        assert(num_bin_members);
        bin_members_per_bin[bin_index].resize(num_bin_members);

        size_t bin_min_index = bin_centers[bin_index] - BIN_RADIUS;

        // deterministically generate bin members in the range [bin_min_index, bin_max_index]
        for (size_t bin_member_index{0}; bin_member_index < bin_members_per_bin[bin_index].size(); bin_member_index++)
        {
            bool was_duplicate;
            size_t duplicate_nonce{0};

            // prevent duplicates within each bin
            do
            {
                was_duplicate = false;

                bin_members_per_bin[bin_index][bin_member_index] = mod_large(rand_from_seed<u128>(H("bin_members", input_seed, bin_index, bin_member_index, duplicate_nonce)), 2 x BIN_RADIUS + 1) + bin_min_index;

                for (size_t i{0}; i < bin_member_index; i++)
                {
                    if (bin_members_per_bin[bin_index][i] == bin_members_per_bin[bin_index][bin_member_index])
                    {
                        was_duplicate = true;
                        ++duplicate_nonce;

                        break;
                    }
                }
            } while (was_duplicate);
        }
    }

    /// rotate all bin members
    assert(tx.inputs[input_index].bin_member_rotator < 2 x BIN_RADIUS + 1);

    for (size_t bin_index{0}; bin_index < NUM_BINS; bin_index++)
    {
        size_t bin_min_index = bin_centers[bin_index] - BIN_RADIUS;

        for (auto &bin_member : bin_members_per_bin[bin_index])
            bin_member = mod_add(bin_member - bin_min_index, tx.inputs[input_index].bin_member_rotator, 2 x BIN_RADIUS + 1) + bin_min_index;

        bin_members_per_bin[bin_index].sort();
    }

    /// concatenate bins for full set of ring members
    all_ring_members_per_input[input_index].reserve(RING_SIZE);

    for (const auto &bin_members : bin_members_per_bin)
        for (const auto bin_member: bin_members)
            all_ring_members_per_input[input_index].emplace_back(bin_member);
}

return: all_ring_members_per_input

Rationale/Discussion

  • Note that there is a lot of duplication between this section and the earlier sections, so in practice the algorithm can be condensed.

Modifications for small ring sizes

This algorithm can be adjusted so all bins only have one member each, which can applied as an optimization/upgrade to Monero's current transaction protocol. Doing so would reduce input reference bytes from (num_inputs x ring_size x varint) to (varint + num_inputs x (u64 + varint)).

  • Set:
    • NUM_BINS = RING_SIZE
    • BIN_RADIUS = 0
    • These settings imply that get_num_bin_members() will only return 1, and that each bin center will equal its lone bin member.
  • Exclude bin_member_rotator from bin configuration details.

Tx changes

Modify the transaction input structure and the form of the message signed by transaction inputs.

  • As @TheCharlatan explains below, the txin_to_key struct should contain a vector of commitments to outputs (i.e. hashes of each output).
  • The txin_to_key struct should also contain the values bin_rotator and bin_member_rotator introduced by this proposal.
  • The transaction prefix hash should include binning_upper_bound from this proposal:
m_p = Hash(version, unlock_time, binning_upper_bound, {txin_i}_i, {txout_i}_i, extra)

Tx validation

When you encounter a tx to validate, recover the ring members for each input from the bin configuration details defined in this proposal.

Blockchain and tx hashes

  1. A transaction hash includes all tx data.
  2. Output references (e.g. all binning details) are stored in the blockchain as part of tx structs. Since output references and the outputs themselves are part of tx hashes, the blockchain can only be re-validated with the correct output references for each tx.

Rationale

Why not use the binning strategy from this paper?

  • The paper's strategy requires more data. Individual bins are referenced with independent identifiers, instead of implicitly via a hash function and pseudo-random seed.
  • In the paper's strategy, bins are implementation-defined, which means they can't be enforced by the consensus protocol. The result would be less transaction uniformity if multiple transaction-builders use different methods to define bins.
  • Bin selection in the paper is relatively inferior to this proposal. In the paper, the real spend's bin is 'added' to the set of decoy bins, whereas in this proposal we select a set of bins directly from the ledger-wide selection distribution that just so happens to include the real spend's bin. Doing so eliminates the ability of observers to guess which bin is least likely to have been selected from the ledger-wide selection distribution.
    • Note that this proposal's approach does not solve analysis that spans many transactions. For example, if all transactions have a true spend in a bin that is exactly 1 day old, then when looking at the bins selected by many transactions, the over-selection of 1-day-old bins will stand out.
  • Bin member selection in the paper is somewhat haphazard and does not address edge cases effectively (e.g. true spends that don't land in deterministic bins).

Backward compatibility

This proposal changes the structure of tx and tx validation rules, so it requires a hard fork. It is well-suited to being implemented in the same hard fork that rolls out Triptych (or a similar next-gen tx protocol with large ring sizes), but is also compatible with Monero's current transaction protocol.

License

MIT license

References

UkoeHB avatar Jun 01 '21 02:06 UkoeHB

This proposal has a lot of potential to fix many of the UX problems of Monero. If constructed carefully it could also get rid of the 10 confirmation spending requirement, allow transaction mempool replaceability, and fee bumping techniques. To make the above proposal work, the signature message structure needs to be changed. Adopting some of the notation for MLSAG message construction from the hardware wallet paper the current signature message structure m is:

txin_iis the input struct txout_j is the output struct ECDH_i represent encrypted amounts C_i are the output commitments m_p = Hash(version, unlock_time, {txin_i}_i, {txout_i}_i, extra) is the transaction prefix hash m = Hash(m_p || Hash({ECDH_i}_i || {C_i^0}_i) || Hash({rangeProofData_i}_i)) is the signature message

In the code txin_i and txin_out are encoded in the transaction_prefix, m_p in the above notation, in cryptonote_basic.h. Both are encoded as a collection of types, two of which, txin_to_script/txout_to_script and txin_to_scripthash/txout_to_scripthash, are unused. txin_gen encodes coinbase inputs, while txin_to_key/txout_to_key are used for all other transactions. txout_to_key contains the output one time key.

txin_to_key is defined with the fields:

struct txin_to_key { 
    uint64_t amount;
    std::vector<uint64_t> key_offsets;
    crypto::key_image k_image;  // double spending protection

The transaction prefix hash is what needs to be changed. Instead of txin_to_key encoding a vector of key offsets, it should encode a vector of commitments to the specific output. This can be achieved by creating a new input type struct and adding a vector of the one-time public key of the output, the output index in the transaction (the analog of vout in Bitcoin) and the transaction id of the transaction containing the output to the txin_to_key struct. This ensures that a truly unique output is selected from.

The transaction ID construction currently uses the same transaction_prefix_hash as well. Any malleability caused by adding the floating bin index later on would not change the transaction id if it would still be reusing the function for the signature message hashing. At the same time this is problematic, since data outside of a transaction would need to be hashed as well. Instead, two distinct transaction ID's can be computed, similar to what Bitcoin has done to eliminate transaction malleability. One including the binning information and one without.

TheCharlatan avatar Jun 12 '21 15:06 TheCharlatan

Probably a stupid question: if both transactions need to be constructed before the first one is sent, how does that fix the UX issue in Monero regarding the 10 confirmation spending requirement? Couldn't the user just construct one transaction with multiple outputs?

boogerlad avatar Jun 24 '21 04:06 boogerlad

@boogerlad I think the idea is if you spend an output added to the chain in the recent 10 blocks, then if the chain gets reorged, your tx can survive if the other tx also survives. The output you spend would be 'floating', so miners can adjust the floating offset when there is a reorg.

UkoeHB avatar Jun 24 '21 04:06 UkoeHB

I see. Can step 2 and 3 be swapped?

boogerlad avatar Jun 24 '21 05:06 boogerlad

Sure they can, the steps are just demonstrating tx chaining (making tx when output you are spending doesn't exist in chain yet).

UkoeHB avatar Jun 24 '21 05:06 UkoeHB

Clever construction. Just one thought about maintaining transaction indistinguishability and fungibility: if we allow any transaction to have a floating ring member, then the protocol should enforce that every transaction has a floating ring member. Then it won't stand out when people do use the feature.

Mitchellpkt avatar Jun 30 '21 16:06 Mitchellpkt

I find the concept of a hash-based bin selection interesting, because assuming the algorithm is reproducible across CPUs(!), the transaction size should be much smaller than listing each ring member. Or did I misunderstand that part of the proposal?

@boogerlad I think the idea is if you spend an output added to the chain in the recent 10 blocks, then if the chain gets reorged, your tx can survive if the other tx also survives. The output you spend would be 'floating', so miners can adjust the floating offset when there is a reorg.

If attempting to spend immediately, then this also leaks the real spend? If so, then this is certainly a "non-starter" if a user has to choose between privacy and immediate spending. This hurts the on-chain privacy of everyone.

vtnerd avatar Jun 30 '21 20:06 vtnerd

I find the concept of a hash-based bin selection interesting, because assuming the algorithm is reproducible across CPUs(!), the transaction size should be much smaller than listing each ring member. Or did I misunderstand that part of the proposal?

Yes it is hash-based and reproducible across CPUs.

If attempting to spend immediately, then this also leaks the real spend? If so, then this is certainly a "non-starter" if a user has to choose between privacy and immediate spending. This hurts the on-chain privacy of everyone.

@vtnerd If all tx have a floating index in the same range (relative to chain height), then observers won't necessarily know if floating index in last 10 blocks is due to a fast spend or randomness.

The key to this is defining a floating output selection zone (constants BINNING_UPPER_BOUND_SELECTION_WIDTH and MIN_ALLOWED_WIDTH_FLOATING) that doesn't give away information in most cases.

UkoeHB avatar Jun 30 '21 21:06 UkoeHB

love the binning and chaining. I'm pondering the specific use-case of the 1 floating ring member. IMO, an ideal monero protocol would be able to survive a complete netsplit - i.e., 2 subnetworks form that end up completely isolated from each other. Then, at some other time, the two networks become reconnected. Afaiu, it should be theoretically possible for the two subnetworks blockchains to reform into a single chain in a massive re-org. For instance, in bitcoin I think this is possible - the txs in the alt chain would just be dumped to the txpool, and they get mined back into the mainchain.

From what I'm understanding from this proposal, the 1 floating ring member wouldn't accommodate such an event.

I mean, I get it, the single floating ring member is designed for a specific use case of tx-chaining and not really the possible but improbable netsplit. However, I'm always a fan of making the protocol more robust. cockroach factor. E.g., can monero be designed to survive nuclear war?

Gingeropolous avatar Jul 01 '21 22:07 Gingeropolous

@Gingeropolous the 'Reorg' section discusses netsplits. You can't have both a binning strategy (to minimize reference storage) and recover from arbitrarily large netsplits, but you can recover from small reorgs.

UkoeHB avatar Jul 01 '21 22:07 UkoeHB

Relevant article on a real-world use-case that would benefit from this today:

https://comit.network/blog/2021/07/02/transaction-presigning/

It is our assumption that anything that relies on joint-outputs and pre-signing spending transactions doesn't actually work on present day Monero. We haven't done an extensive analysis on what this affects but our guess is that this applies to at least anything Layer2 but likely also to other blockchain protocols developed for Monero. Most "interesting" blockchain protocols today work on the basis of a joint-owned output with spending transactions which leak secrets when broadcast. As long as we want to remain trustless, we have to create valid spending transactions before the actual output gets mined, otherwise we are dependent on cooperating with the other party to unlock our funds.

sethforprivacy avatar Jul 02 '21 12:07 sethforprivacy

The key to this is defining a floating output selection zone (constants BINNING_UPPER_BOUND_SELECTION_WIDTH and MIN_ALLOWED_WIDTH_FLOATING) that doesn't give away information in most cases.

"in most cases" is not a good phrase in the given context. The problem is that the hash-based approach should require some information leakage about the real spend, correct? And technically this occurs in all contexts afaik.

In other words, with this approach the signer has to "work-backwards" from the real-spend and put some information so that the verifier is guaranteed to use it in the ring. The signer must do this because a purely hash-based approach will never "select" the desired output. So instead, we have this scheme where the signer tries to obfuscate the real spend, but in so doing is leaking more information than the current approach.

This scheme is just too complicated and is going to have subtle leaks to be acceptable.

vtnerd avatar Jul 02 '21 16:07 vtnerd

The problem is that the hash-based approach should require some information leakage about the real spend, correct? And technically this occurs in all contexts afaik.

I'm not sure what you mean by this. The scheme described here only leaks:

  • a ballpark of when the tx was constructed (a function of BINNING_UPPER_BOUND_SELECTION_WIDTH); I don't see any way to not leak this while using a gamma distribution for ring members (i.e. we already leak this).
  • if BINNING_UPPER_BOUND_SELECTION_WIDTH + MIN_ALLOWED_WIDTH_FLOATING is too large, then the probability of the floating output being the real spend will be greater than 1/RING_SIZE (I believe there is a theoretical optimal size for this zone based on the gamma distribution; these are non-consensus constants as well, and can be adjusted algorithmically). Note that if there is no floating index, no binning upper bound width, and all real spends match the gamma distribution, then in this scheme the probability any ring member is the real spend will be 1/RING_SIZE.
  • if the floating output is outside the random selection zone width, then it is almost guaranteed to be the real spend; I view this as a vector for 'opting-out' of privacy guarantees. It is impossible for Monero to have tx chaining without this vector.

UkoeHB avatar Jul 02 '21 17:07 UkoeHB

I'm not sure what you mean by this. The scheme described here only leaks:

There is more information about the real spend than the current approach. Or rather, there is more subtle metadata about the real-spend in these floating values than the current dead-simple approach. The problem stems from the hash-based approach for ring selection - it forces the signer to commit to some value with a relationship to the real spend.

I view this as a vector for 'opting-out' of privacy guarantees

Which hurts the privacy of others typically because this output cannot be used in a ring past or present. Monero has been moving away from optional features for this reason, particularly with ring-selection. The size of the ring isn't even signer selectable for instance.

vtnerd avatar Jul 02 '21 17:07 vtnerd

Or rather, there is more subtle metadata about the real-spend in these floating values than the current dead-simple approach. The problem stems from the hash-based approach for ring selection - it forces the signer to commit to some value with a relationship to the real spend.

Metadata leakage about the real spend, in the context of a floating output, is mostly independent of how ring members are selected (the use of hashes here is irrelevant because everything important is randomized). If we had pure random selection over a gamma distribution, then the floating index would still leak information. Maybe even the same amount of information, because with large rings you can more accurately extrapolate the 'distribution upper bound' (i.e. the max index when selecting from the gamma distribution).

The problem is all ring members must be known when constructing a transaction. Practically speaking, this means all non-spends must exist on-chain already. If all your decoys are selected from on-chain, then the chain height when you selected ring members cannot be greatly obfuscated. Floating outputs that are too far away from that chain height cannot be hidden.

Do you have a suggestion how tx chaining can be done better? Or are you arguing tx chaining should not be supported at all?

UkoeHB avatar Jul 02 '21 17:07 UkoeHB

Metadata leakage about the real spend, in the context of a floating output, is mostly independent of how ring members are selected (the use of hashes here is irrelevant because everything important is randomized). If we had pure random selection over a gamma distribution, then the floating index would still leak information. Maybe even the same amount of information, because with large rings you can more accurately extrapolate the 'distribution upper bound' (i.e. the max index when selecting from the gamma distribution).

You need to elaborate on all cases where the real spend is leaked. So far it appears when tx-chaining is used, it definitely leaks the real spend to miners and anyone watching the mempool. I don't see how this is an acceptable tradeoff if Monero's privacy can be weakened simply by using a feature to use outputs quicker. And as I already stated, this hurts the privacy of everyone on-chain because it reduces the anonymity set.

What I'm also pointing out, and you appear to be dodging through unclear language, is that even in the normal case metadata is leaked. This actually easiest to see in the verification algorithm. The signer does not control the seed for the randomizer function, but still has to get the verifier to select the "real" output in a "bin". So there's a connection between the real-spend and these offset values (that are stored on-chain), that is not present in the current approach

The problem is all ring members must be known when constructing a transaction. Practically speaking, this means all non-spends must exist on-chain already. If all your decoys are selected from on-chain, then the chain height when you selected ring members cannot be greatly obfuscated. Floating outputs that are too far away from that chain height cannot be hidden.

Yes, I understand the problem you are trying to solve.

Do you have a suggestion how tx chaining can be done better? Or are you arguing tx chaining should not be supported at all?

I have no reason to "not support" tx chaining (i.e. the question is non-sense, of course I would want something like tx chaining in Monero). However, tx chaining should not be added if it reduces privacy. I don't see how its possible to have both, unless the tx size is increased dramatically.

vtnerd avatar Jul 02 '21 18:07 vtnerd

You need to elaborate on all cases where the real spend is leaked.

"warning: if floating_index - binning_upper_bound > BINNING_UPPER_BOUND_SELECTION_WIDTH + MIN_ALLOWED_WIDTH_FLOATING, then an input is guaranteed to be flagged as floating=real & real=chained"

Aside from this

  • if BINNING_UPPER_BOUND_SELECTION_WIDTH + MIN_ALLOWED_WIDTH_FLOATING is too large, then the probability of a floating output being real is > 1/RING_SIZE due to overlap with the gamma distribution (if you integrate the gamma distribution from [binning upper bound, chain height at tx construction], and result/(integrate entire curve) > 1/RING_SIZE, then the chance the real spend is floating will be disproportionate to other ring members)
  • observers can use {unknown at the time of analysis} real-world heuristics to discern the true spend; things like differences between the true spend distribution and the gamma distribution, patterns of use for tx chaining, etc.

I don't see how this is an acceptable tradeoff if Monero's privacy can be weakened simply by using a feature to use outputs quicker. And as I already stated, this hurts the privacy of everyone on-chain because it reduces the anonymity set.

This argument is fine with me and definitely needs to be discussed. I don't have a strong opinion either way - tx chaining was requested by other people. However, I'd like to disentangle it from more technical criticisms of the scheme.

What I'm also pointing out, and you appear to be dodging through unclear language, is that even in the normal case metadata is leaked. This actually easiest to see in the verification algorithm. The signer does not control the seed for the randomizer function, but still has to get the verifier to select the "real" output in a "bin". So there's a connection between the real-spend and these offset values (that are stored on-chain), that is not present in the current approach

Rather than dodging it, these sections are WIP and I have not gotten around to explaining why it works. I agree if the offsets leak any metadata then the scheme is broken. The scheme is supposed to:

  1. if the real spend is in a bin, randomize the 'center' of the bin relative to the true spend index
  2. use a uniformly distributed offset to map one hash-derived bin center onto the real spend's bin center
  3. within the real spend's bin, use a uniformly distributed offset to map one hash-derived bin member onto the real spend's index

If each of those is uniformly distributed, then no information is leaked about which bin member is the real spend. In other words, it is equally probable that the offsets map any of the hash-generated things onto the real bin/bin member.

I have no reason to "not support" tx chaining (i.e. the question is non-sense, of course I would want something like tx chaining in Monero). However, tx chaining should not be added if it reduces privacy.

I don't appreciate the antagonistic pedantics. Everything 'added' to Monero is 'supported' by Monero, so arguing that tx chaining shouldn't be 'added' means you are arguing it shouldn't be 'supported' (in my view of the meaning of these words....).

UkoeHB avatar Jul 02 '21 19:07 UkoeHB

If each of those is uniformly distributed, then no information is leaked about which bin member is the real spend. In other words, it is equally probable that the offsets map any of the hash-generated things onto the real bin/bin member.

I doubt this claim is accurate, but I'm going to need something closer to pseudo-code/Python before going further - the real-spend is the only one where the offsets were manipulated to work and these offsets are recorded permanently in the blockchain.

I don't appreciate the antagonistic pedantics. Everything 'added' to Monero is 'supported' by Monero, so arguing that tx chaining shouldn't be 'added' means you are arguing it shouldn't be 'supported' (in my view of the meaning of these words....).

Your original statement was "Or are you arguing tx chaining should not be supported at all?", and I was stating that I can be against your approach, but in-favor of the feature in general. Perhaps there is no way to add tx-chaining without reducing privacy (for others) or increasing tx size, in which case the feature should not be supported. That's typically how the Monero community has done feature selection in the past at least.

If the ring-selection algorithm can filter out definitely leaked real-outputs, this changes the argument because other participants are less affected by tx-chaining (it lowers potential anonymity set size but not the ring security). I believe this is possible, but likely complicates the ring-selection even further.

vtnerd avatar Jul 02 '21 20:07 vtnerd

If the ring-selection algorithm can filter out definitely leaked real-outputs, this changes the argument because other participants are less affected by tx-chaining (it lowers potential anonymity set size but not the ring security). I believe this is possible, but likely complicates the ring-selection even further.

As a quick follow-up: everytime a ring member is identified as a real-spend, it weakens the privacy of other rings where it was used as a "dummy". This was why ring size 1 was banned, but this case is different if all participants can see the "leak" during transaction construction. It's still on the edge of what the Monero community has supported in the past as this creates two distinct sets of Monero transactions, whereas all of the analytic researchers are trying to eliminate anything different in transactions. Their ideal is all uniform cryptographic data.

vtnerd avatar Jul 02 '21 20:07 vtnerd

As a quick follow-up: everytime a ring member is identified as a real-spend, it weakens the privacy of other rings where it was used as a "dummy". This was why ring size 1 was banned, but this case is different if all participants can see the "leak" during transaction construction. It's still on the edge of what the Monero community has supported in the past as this creates two distinct sets of Monero transactions, whereas all of the analytic researchers are trying to eliminate anything different in transactions. Their ideal is all uniform cryptographic data.

I had similar thoughts. My take was if most txs that need to use floating outputs can be 'hidden' among normal txs, then a floating offset style may be worthwhile. If most txs using floating outputs cannot be hidden (e.g. because floating outputs are mostly used for tx-chaining, and the delay between tx signing and floating output being added to chain is too big in most of those cases), then floating offsets are no better than a 1-member ring, so we might as well support two input types (pure binning, and 1-member rings) if we want floating outputs.

When the proposal gets more fleshed out I will be sure to discuss this aspect of the topic, thank you for bringing it up.

UkoeHB avatar Jul 02 '21 21:07 UkoeHB

If most txs using floating outputs cannot be hidden (e.g. because floating outputs are mostly used for tx-chaining, and the delay between tx signing and floating output being added to chain is too big in most of those cases), then floating offsets are no better than a 1-member ring

In this scenario, the privacy of Monero has arguably dropped considerably. We've allowed people a technique to "shoot themselves in the foot" (Monero is private, unless you do this one thing!), and the total anonymity set has dropped weakening any future gains from Triptych. The more the discussion continues, I don't see how I could support the floating offsets idea (when it is a defacto 1-member ring).

In fact, why not just allow tx chaining with 1 member rings? Its less complicated and more clear on whats actually happening. Which...

we might as well support two input types (pure binning, and 1-member rings) if we want floating outputs.

If this approach is taken, "pure binning" would need a separate discussion for its inclusion because it is no longer required for tx-chaining to work. Although you've suggested as such with the title "Tx Chaining and Ring Binning".

vtnerd avatar Jul 02 '21 22:07 vtnerd

In fact, why not just allow tx chaining with 1 member rings? Its less complicated and more clear on whats actually happening.

The problem from my pov is, for example, that all tx trying to get around the 10-block lock time would have to use the 1-member rings. But at least some of those tx could be obfuscated among slower spends using this proposal, so there is an opportunity cost.

While outputs spent by 1-member rings can be blackballed, they can only be blackballed after they have been spent. Rings created before then can be polluted. Plus, you can just as easily blackball floating outputs from this proposal that are unequivocally real-spends.

  • (1-member rings): can blackball all outputs after they have been spent
  • (this proposal): can blackball all outputs that are unequivocally real-spends after they have been spent; some of the remaining outputs may not be heuristically neutral

So, if I were to advocate 1-member rings over this proposal, I would have to think that the pollution of ring signatures with heuristically non-neutral outputs is a greater cost (combined with the chain cost of useless ring signatures created for unequivocal real-spends) than the benefit of allowing those outputs (plus a set of outputs that are heuristically neutral) to be spent in ring signatures. I don't have enough information to judge this trade-off.

The more the discussion continues, I don't see how I could support the floating offsets idea

Floating offsets are very easy to chop out of this proposal if ultimately not desired.

UkoeHB avatar Jul 02 '21 23:07 UkoeHB

Plus, you can just as easily blackball floating outputs from this proposal that are unequivocally real-spends.

Then why not just use 1-member rings in these cases. Its simpler for all parts of the code, and there's no hiding the weakened privacy/security in obscure algorithms.

some of the remaining outputs may not be heuristically neutral

Can you clarify this point? A proposal where the real-spend is known initially but is later hidden isn't acceptable because it creates "land-mines" in the output selection process. It gives chain-analysis companies that are constantly monitoring the mempool an advantage over Monero wallets that are not constantly monitoring the mempool.

So to repeat - if a real-spend can be determined after leaving the node of origin, then the 1-member ring approach should be used. Its simpler and is effectively the same thing. The community dropped such rings previously though, which is ultimately why I have been critical of this part of the idea.

Floating offsets are very easy to chop out of this proposal if ultimately not desired.

Ok, but the binning approach still requires much scrutiny as I suspect it also leaks metadata.

vtnerd avatar Jul 03 '21 04:07 vtnerd

some of the remaining outputs may not be heuristically neutral

Can you clarify this point?

In this proposal, a floating output with an index way higher than the binning upper bound is unequivocally a real spend (unless the tx author is messing around or there is a wonky implementation). However, floating outputs close to the upper bound might be real spends or decoys.

If those floating outputs are heuristically neutral, then there exist no heuristics that, when used, increase the observer's chance of correctly guessing if one of them is a true spend. There is no way to guarantee that no such heuristic exists, so for the sake of argument we can assume that at least some proportion of floating outputs may not be heuristically neutral (i.e. guessing that they are the true spend has > 1/RING_SIZE chance of being correct).

These hypothetical floating outputs with heuristic weaknesses would not exist in the 1-member ring scenario, so the fact they would be able to pollute ring signatures is a cost that must be added to the analysis (even if in practice it is insignificant).

Note: Even pure ring signatures over a gamma distribution might not be heuristically neutral if the true spend distribution does not match the gamma distribution. An analyst who knows the true spend distribution has a heuristic advantage for detecting true spends.

A proposal where the real-spend is known initially but is later hidden isn't acceptable because it creates "land-mines" in the output selection process.

This is not possible here.

So to repeat - if a real-spend can be determined after leaving the node of origin, then the 1-member ring approach should be used. Its simpler and is effectively the same thing. The community dropped such rings previously though, which is ultimately why I have been critical of this part of the idea.

I don't have strong opinions about the best approach. My goal is to finish the proposal, explain it as best I can, and adjust it according to other people's discussion. Perhaps talking to those who are invested in floating outputs (be it via this proposal or simple 1-member rings) would be more productive toward reaching a final consensus @TheCharlatan.

The reason I did not propose 1-member rings originally was two-fold:

  • at least some floating output real-spends can be obfuscated by ring signatures
  • it isn't clear how beneficial it would be to support both this proposal and 1-member rings (i.e. for unequivocal floating output spends), compared to the implementation effort

UkoeHB avatar Jul 03 '21 05:07 UkoeHB

Note: Even pure ring signatures over a gamma distribution might not be heuristically neutral if the true spend distribution does not match the gamma distribution. An analyst who knows the true spend distribution has a heuristic advantage for detecting true spends.

Why "ring binning"? Ignore "tx chaining" for second - would this design still be recommended without that use case? Does it result in smaller transaction sizes, etc.? Because it certainly does not improve on the heuristics of a bad RNG for the gamma distribution stage.

Edit: Above you mentioned O(1) scaling, but this scheme does not appear to be that. So I'm primarily looking for a comparison of your actual approach instead of an abstract case.

at least some floating output real-spends can be obfuscated by ring signatures

This portion of my response is about "tx chaining" (spending outputs currently in the mempool or less than 10 blocks).

Can you list the scenarios where this is possible. Let me help you - is the transaction being broadcast multiple times or is the miner manipulating the transaction? Because if there are re-broadcasts by the wallet this gives away some metadata to people watching the mempool (it was a quick spend). OR if the miners can do it, what magic allows them to adjust the indexes without knowing the real spend (they would have to know the real spend to set the indexes)? Just walk through the transaction chaining scenario a bit more.

I'm insisting on specifics here because I suspect once its implemented if people knew how it worked they would reject it (1-member rings achieve the same thing without the obfuscation and complicated algorithm).

it isn't clear how beneficial it would be to support both this proposal and 1-member rings (i.e. for unequivocal floating output spends), compared to the implementation effort

This portion of my response is about "tx-chaining"

1-member rings are dead simply compared to this approach. And I'm not supporting 1-member rings but they allow transaction chaining easily.

vtnerd avatar Jul 03 '21 14:07 vtnerd

Why "ring binning"? Ignore "tx chaining" for second - would this design still be recommended without that use case? Does it result in smaller transaction sizes, etc.? Because it certainly does not improve on the heuristics of a bad RNG for the gamma distribution stage.

Yes it would still be recommended.

  • "We show this binned sampling ensures privacy even in spite of a compromised sampling distribution." source
  • The storage required is very small: varint + num_inputs*(u8 + varint [+ varint for floating offsets]). There aren't any other proposals to compare it with, but I doubt you will find anything that uses less storage.

at least some floating output real-spends can be obfuscated by ring signatures Can you list the scenarios where this is possible. Let me help you - is the transaction being broadcast multiple times or is the miner manipulating the transaction? Because if there are re-broadcasts by the wallet this gives away some metadata to people watching the mempool (it was a quick spend). OR if the miners can do it, what magic allows them to adjust the indexes without knowing the real spend (they would have to know the real spend to set the indexes)? Just walk through the transaction chaining scenario a bit more.

Miners can adjust the floating offset, they don't need to know which output is the real spend. Check out the 'Reorgs' section of the proposal for a full discussion.

A ring with 128 ring members looks like this: [ 127 members: deterministic from binning details ] | 1 floating member

The floating member is referenced like floating_index = floating_offset + binning_upper_bound. The floating_offset is not signed by tx authors, so miners can change it to match the real location of the floating member, or tx authors can set it after the tx has been constructed. In tx chaining, the real spend is the floating member. You wait until the tx you chain on top of is added to the ledger before setting the floating offset value.

at least some floating output real-spends can be obfuscated by ring signatures

Every tx input has a floating member. Most of those will be decoys. Tx authors with decoy floating members must select the decoys from a range of blocks above the binning_upper_bound. This range has a certain maximum width. If a real-spend floating member is within that range, then an observer won't know for sure if it is a decoy or real-spend. If they have good heuristics, they may be able to guess with probability of success > 1/RING_SIZE if the floating member is real or not.

Tx authors won't always know in advance if their real-spend floating member will fall into the 'decoy range'. This is because the final index of that member may be unknown.

UkoeHB avatar Jul 03 '21 20:07 UkoeHB

Yes it would still be recommended.

  • "We show this binned sampling ensures privacy even in spite of a compromised sampling distribution." source
  • The storage required is very small: varint + num_inputs*(u8 + varint [+ varint for floating offsets]). There aren't any other proposals to compare it with, but I doubt you will find anything that uses less storage.

The source paper has less variables and the selection algorithm is vastly more simple. Can you explain why the changes from the paper? It appears to be for the reorg/chaining case. This also means the claims of the paper no longer apply and analysis of your altered approach needs to be done. The approach in the paper is simpler than what you have here.

Miners can adjust the floating offset, they don't need to know which output is the real spend. Check out the 'Reorgs' section of the proposal for a full discussion.

In tx chaining, the real spend is the floating member. You wait until the tx you chain on top of is added to the ledger before setting the floating offset value.

Ok, your alteration to the paper is more clear now. You incorrectly answered my question above though. Nodes constantly watching the mempool have an advantage. The recipient always knows when transaction chaining has been enabled (the floating offset is the real spend), people watching the mempool can assume its the real spend when floating_offset is spent before 10 blocks (or whatever cutoff), but the blockchain is not guaranteed to store this transaction within the 10-block (or whatever) cutoff.

I now understand why your answers have been a bit all over the place (from my perspective), there isn't a guaranteed way to avoid some of these transactions after-the-fact. I'm still advocating for 1-member rings instead of your proposal for this reason. Also a reminder to anyone bothering to read this: the paper supplied above has a different algorithm and scheme that is simpler, and I would support that scheme over this one (although I haven't done enough reading to say whether the claims in that paper are accurate).

vtnerd avatar Jul 03 '21 20:07 vtnerd

You incorrectly answered my question above though. Nodes constantly watching the mempool have an advantage. The recipient always knows when transaction chaining has been enabled (the floating offset is the real spend), people watching the mempool can assume its the real spend when floating_offset is spent before 10 blocks (or whatever cutoff), but the blockchain is not guaranteed to store this transaction within the 10-block (or whatever) cutoff.

This response confuses me. Decoy floating members can and should be selected from the most recent 10 blocks. Why wouldn't they be? It's true that the presence of a floating output in the last 10 blocks may be a heuristic that increases the likelihood of it being a real spend, but it is not any 'guaranteed' information. This does not contradict my earlier comments.

How would a recipient know anything more than observers about inputs to a tx? EDIT: ah, if the recipient knows they are receiving an output from a chained tx, then they will know the floating output is a real-spend.

It's true that nodes watching the mempool have a slight advantage. They witness all reorgs, so they will have some slight timing information about transactions that a later observer won't have. I had not considered this (not that my failure to think of something warrants your aggressive attitude).

I now understand why your answers have been a bit all over the place (from my perspective), there isn't a guaranteed way to avoid some of these transactions after-the-fact. I'm still advocating for 1-member rings instead of your proposal for this reason.

You seem to be assuming most people would use the blackball tool. Maybe @SamsungGalaxyPlayer can give some insight on how many people actually use it in practice.

UkoeHB avatar Jul 03 '21 21:07 UkoeHB

This response confuses me. Decoy floating members can and should be selected from the most recent 10 blocks. Why wouldn't they be? It's true that the presence of a floating output in the last 10 blocks may be a heuristic that increases the likelihood of it being a real spend, but it is not any 'guaranteed' information. This does not contradict my earlier comments.

Yes, I probably responded too quickly previously -

If Bitcoin is any indication, there would be a strong-bias towards the floating_index being the real spend. And the gamma distribution cannot select within this range (did I miss that part? there's multiple overlapping knobs in this thing), so that changes heuristics/statistics tremendously. Its basically telling you that x% of the time, the floating_index will be the real spend. This should help chain-analysis ... ?

I had not considered this (not that my failure to think of something warrants your aggressive attitude).

I'm being harsh because you made statements like "We show this binned sampling ensures privacy even in spite of a compromised sampling distribution." source, but failed to mention that the algorithm presented here differs from the paper significantly. In some pretty major ways in fact. Even using the hash function as a seed (its not entropy as claimed) forces the signer to go through a gate that requires additional knobs to "hide" the real spend. This should also help statistics/heuristics identify the real-spend, but its going to take a while for me to show that one.

And even after all that, the binning approach is sub-optimal to the bin_size == 1 algorithm that's currently in use (because it reduces the gamma distribution sampling making the "noise" smaller).

How would a recipient know anything more than observers about inputs to a tx? EDIT: ah, if the recipient knows they are receiving an output from a chained tx, then they will know the floating output is a real-spend.

Sorry got my explanation jumbled up. The entire thing is dodgy because when a new block arrives, anyone can broadcast a chained transaction for outputs in the last block. Transactions that go out immediately are more likely to be real-spends than ones delayed even by 20 seconds, etc. Its basically telling mempool watchers the real-spenders. I suppose wallets could have a randomized delay/backoff that matches a gamma distribution towards the 10-block cutoff ... but there's no way to enforce this at the policy at the node (aside from our current approach of banning outputs within last 10 blocks).

The problem is "ahh my wallet just selected a 9-block wait time, no way, let me kludge that to 1 block in my copy of the code".

EDIT: In comparison, the current approach allows for multiple alternate decoys to be selected in the 10 block during the gamma distribution stage.

You seem to be assuming most people will use the blackball tool. Maybe @SamsungGalaxyPlayer can give some insight on how many people actually use it in practice.

I'm assuming that most people will not use the blackball tool. It should be trivial for signers to "skip" 1-member ring outputs but the idea is crap anyway (there's a reason why they were banned).

floating_index is trickier and basically requires the blackball approach afaik. And even then its someone doing messy heuristics that will never be exactly what chain-analysis companies have.

vtnerd avatar Jul 03 '21 23:07 vtnerd

I'm being harsh because you made statements like "We show this binned sampling ensures privacy even in spite of a compromised sampling distribution." source, but failed to mention that the algorithm presented here differs from the paper significantly.

I was responding to your question "Why "ring binning"?" Binning is recommended by that paper, hence 'the use of binning' is still recommended. I was not using that quote to claim this proposal's design is flawless. Hopefully this clears up any misunderstanding...

Even using the hash function as a seed (its not entropy as claimed) forces the signer to go through a gate that requires additional knobs to "hide" the real spend. This should also help statistics/heuristics identify the real-spend, but its going to take a while for me to show that one.

I was admittedly using the term 'entropy' haphazardly. The ring_seed (renamed) borrows entropy from the pseudo-output commitments.

If the hash-based generation of bins and bin members has any statistical flaws, then I will happily revise this proposal. My intention is not to break Monero... just to pursue the most efficient possible ring binning algorithm. The tradeoff between complexity and tx size can be made after alternatives are well understood (a work-in-progress for everyone, I think).

EDIT: In comparison, the current approach allows for multiple alternate decoys to be selected in the 10 block during the gamma distribution stage.

Not sure what you mean. Due to the 10-block locktime enforced by consensus, 'the current approach' cannot reference outputs in the last 10 blocks.

And even then its someone doing messy heuristics that will never be exactly what chain-analysis companies have.

I agree with this. It is easier to assess the implications of a 'floating output' with a concrete algorithm. Can a floating output scheme with some elements of privacy exist? I want to at least analyze something concrete, try to address criticisms with concrete adjustments, etc. I have neither the authority nor coding competence to force Monero into using this stuff.

Maybe if you were the only stakeholder in the system I wouldn't bother going through the effort, but floating outputs and a maximally-optimized algorithm for binning were both requested by @TheCharlatan, so this discussion will not find its conclusion that easily.

UkoeHB avatar Jul 06 '21 00:07 UkoeHB