nano-node Fix accounts_receivable RPC command with sorting and count

Until now accounts_receivable RPC command retrieved only first count blocks, then sorted them by amount. The correct flow is to get all the receivable blocks, sort them by amount and apply count at the end, which this PR is about. Related issue: https://github.com/nanocurrency/nano-node/issues/3801

Jun 27 '22 19:06 JerzyStanislawski

I'd like to improve the algorithm we use here since we're in the area.

The first issue I see is all the pending entries are added followed at the end by clamping to the requested size. This consumes memory proportional to the total number of pending entries but we can solve this a more efficient way. Every time we add an item to the map if we check to see if the map has exceeded the requested size we can erase the minimum item in the map.

The second issue is using conditional logic for checking if sorting was requested. This can be replaced with a strategy pattern that only needs to be constructed once and then the rest of the algorithm proceeds simply by iterating all pending entries and pasing them through the strategy object.

It would be something like: ledger pending entry -> strategy -> result map -> boost ptree

I see that across the RPCs we use 3 different ways to sort things, maps, sorting a vector, and sorting directly in the boost ptree. We should use maps unless there's a quantifyable reason not to since they'll generally be the most efficient. Stable sort is guaranteed by std::multimap "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)" https://en.cppreference.com/w/cpp/container/multimap

We would have a class "result_strategy" which has a pure virtual function result_strategy::operator () (std::pair<nano::pending_key, nano::pending_info>). The main rpc handler function would iterate all pending entries and pass in each item. The non-sorted version would simply insert each item in to the ptree. The sorted version would have an std::multimap<nano::pending_key, nano::pending_info> object state and operator () would insert/trim on each call. The sorted destructor would insert all items from the std::multimap in to the boost ptree.

Future work would be making this a template that could be used in the other places we do this same pattern to eliminate duplicate code.

Jun 27 '22 20:06 clemahieu

Thank you, good points, I'll try to work on that.

Jun 28 '22 06:06 JerzyStanislawski

@clemahieu do we want to introduce sorting field in wallet_receivable?

Jun 28 '22 20:06 JerzyStanislawski

PR updated, following your suggestion I used strategy pattern on two layers - for sorting/non-sorting and for response format (which fields to use in response). I unified all three receivable commands, so that all of them support sorting and the same set of options - source, min_version, offset, sorting, threshold, count. Let me know if this is not desired for some reason and should be reverted. The next step I think would be unification of tests, possibly covering more scenarios. Just wanted to know your opinion on implementation and unification first.

Jul 04 '22 19:07 JerzyStanislawski

Hi @clemahieu, I know it's been a long time and it's not a priority now, however would you be able to take a look at changes here and comment if this is what you expected?

Dec 27 '22 11:12 JerzyStanislawski

This is a great time to revisit this, just as a new cycle is about to begin. @JerzyStanislawski , to make it easier to review your code, could you describe your design? A diagram would be awesome, if possible.

Dec 27 '22 13:12 dsiganos

Sure, so the issue is related to 3 methods - receivable, accounts_receivable and wallet_receivable, in which we apply pretty similar algorithm. In current solution we iterate through pending blocks and build result tree applying conditional logic based on input parameters. Then we sort the tree (if necessary - according to request parameter). This causes a few problems:

Sorting with count parameter may return incorrect results - when stop processing after count iterations we may miss block with higher amount - that's the essence of this issue.
Performance - iterating and sorting in the end is unnecessary when we can build sorted results at a time of iterating.
Conditional logic - many ifs applied on each iteration.
Code duplication across 3 above methods.

The solution that Colin suggested is based on strategy pattern, so instead of applying conditions on each iteration we define the behaviour in a separate class. In fact I created two levels of strategy - the behavioral one in which we define how we handle the blocks (do we sort them, do we apply count parameter, etc.) - this is result_strategy abstract class. It contains result_format_strategy - the abstract class in which implementations we define a func which describes what to return in result json - this is the second level.

So the RPC method implentations are pretty simple now:

First, create proper result_strategy implementation object based on input parameters.

This involves creating result_format_strategy also based on input parameters.

Iterate pending blocks:

Check with strategy object if iteration should be continued (depends on count parameter).
Check with strategy if block should be skipped according to offset parameter.
Apply strategy behaviour to the block.
In case of sorting_result_strategy its destructor (end of iterations) takes care of creating result from multimap. Other implementations work directly on result ptree.

This solution solves problems described above. Maybe the disadvantage is the number of additional classes to maintain? Maybe we should move them to separate namespace? Anyway I don't consider this as final PR, I would like to add more test cases but first I wanted to hear your opinion if this is a good direction at all. Hope it's understandable, let me know if you need further clarifications.

Dec 27 '22 15:12 JerzyStanislawski

With the upcoming upgrade to C++20 this feature could be implemented using the new ranges library, it provides all the necessary tools for sorting, filtering and pagination without the need to reimplement our own. This will ease the burden of maintaining our own algorithms for that purpose.

Dec 27 '22 16:12 pwojcikdev

@JerzyStanislawski would you like to redo this using the C++20 ranges library as suggested by Piotr? There is a C++20 branch in development right now: https://github.com/clemahieu/nano-node/tree/cpp20

If yes, before, before you start coding, let's have a meeting to decide the architecture of the solution, so we are all on the same page.

Jan 11 '23 15:01 dsiganos

Yeah, happy to rework that. My initial understanding is that we should first filter pending transactions to get confirmed blocks for the account then apply other ranges functions according to input parameters (offset, sorting). If this is correct I could redo one of the receivable methods and show you for review first, otherwise feel free to setup some meeting.

Jan 12 '23 08:01 JerzyStanislawski

Hi @JerzyStanislawski, the develop branch in now using C++20 and we can start using the ranges library of C++20 to implement this. Do you have an account in our Nano discord, or some some chat account, where we can discuss this?

Jan 18 '23 10:01 dsiganos

The plan is to add 2 wrapper classes around pending_store to enable interaction with ranges. They must satisfy concepts required by ranges, something like:

class pending_store_iterator_wrapper
{
public:
    nano::pending_store_iterator_wrapper(nano::store_iterator<nano::pending_key, nano::pending_info> & store_iterator)
        store_iterator (store_iterator)
    {
    }

    nano::pending_store_iterator_wrapper & operator++ ()
    {
        ++store_iterator;
        return *this;
    }
// other methods/constructors/operators to satisfy input_iterator concept
private:
    nano::store_iterator<nano::pending_key, nano::pending_info> store_iterator;
}

class viewable_pending_store
{
public:
    nano::pending_store_iterator_wrapper begin() const;
    nano::pending_store_iterator_wrapper end() const;
// other methods/constructors/operators to satisfy viewable_range concept
private:
    nano::pending_store pending_store;
}

So viewable_pending_store.begin() will internally call pending_store.begin() and construct pending_store_iterator_wrapper from nano::store_iterator<nano::pending_key, nano::pending_info> returned by pending_store.

Once we have that we should be ready to use viewable_pending_store with ranges. We'll probably pipe operations according to input parameters:

filter with block_confirmed predicate (always)
drop for offset parameter
take for count parameter
sort if sort parameter
transform to construct output

I don't have strict plan on how to use ranges in receivable methods yet, first I'd like to code PoC with wrapper classes to ensure it works.

Jan 20 '23 08:01 JerzyStanislawski

I wouldn't worry about making a dedicated wrapper for each store, we need to have the results in memory anyway for sorting and serialization. Simply creating a std::vector { store.begin(key), store.end() } should be fine in this case.

Jan 20 '23 13:01 pwojcikdev

And if you want to adapt an existing store to work with ranges, maybe using the standard subrange class will do, like this auto view = std::ranges::subrange(store.begin(key), store.end())

Jan 20 '23 13:01 pwojcikdev

There is a lot of overhead with adapting so I'd be happy to avoid that, but can we afford that in terms of performance? Correct me if I'm wrong but I think loading vector and then ranges operations would iterate all elements twice, while with adapting it would be once. Sorting is exception - with this it's always iterated twice. I'm aware this gain is not big in comparison with loading to memory, but still... Please confirm.

Jan 20 '23 13:01 JerzyStanislawski

Using the std::ranges::subrange should have the same behavior as your adapter class without writing any new code, I haven't tested it, but looking at the documentation it generates an adapter view without loading elements in memory. So I would go with this first. The overhead of iterating elements that are already in memory is minuscule in comparison to loading those from disk in the first place, so I definitely wouldn't worry about iterating the vector a second time either.

The rest of the code (filtering/sorting/transforming) should be the same regardless of the choice here, so it should be possible to easily profile those two approaches and see if there is any meaningful difference.

Jan 20 '23 13:01 pwojcikdev

nano-node nano-node copied to clipboard

Fix accounts_receivable RPC command with sorting and count

nano-node
nano-node copied to clipboard