bitshares1-core icon indicating copy to clipboard operation
bitshares1-core copied to clipboard

Market history commands bad performance

Open svk31 opened this issue 10 years ago • 15 comments

I'd like to add order histories to the bitsharesblocks asset pages but the performance of the above rpc call is making it impossible. Simply looping over the 25 or so market assets and getting the order history (only versus BTS) takes about 15 seconds on my personal computer. Would it be possible to add an index or otherwise speed up this call?

svk31 avatar Dec 31 '14 09:12 svk31

@svk31 What is your primary use-case? For example: listing market history for all market-pegged assets over the last day's worth of blocks

Given the use case I'll see how we can arrange things in the backend.

vikramrajkumar avatar Jan 20 '15 20:01 vikramrajkumar

@valzav Do you use this command in the GUI? What is the normal use-case?

vikramrajkumar avatar Jan 21 '15 22:01 vikramrajkumar

I use it to display blockhain orders history on the market page

valzav avatar Jan 21 '15 22:01 valzav

@valzav Do you show only the user's order history or all history? How many transactions do you limit it to?

vikramrajkumar avatar Jan 21 '15 22:01 vikramrajkumar

I show both blockchain and user's history. blockchain_market_order_history's limit is 500

valzav avatar Jan 21 '15 22:01 valzav

@vikramrajkumar That sounds like a reasonable use case yea, I'd also like to use it for user issued assets. The most recent data is definitely most interesting here, querying by date as in blockchain_market_price_history would be useful for me but not essential.

Compared to the wallet use case the big difference for me is that I want to gather ALL the different combinations of quote and base assets, so it can get pretty heavy..

svk31 avatar Jan 22 '15 16:01 svk31

Also blockchain_market_price_history: https://github.com/BitShares/web_wallet/issues/637#issuecomment-91795936

vikramrajkumar avatar Apr 13 '15 17:04 vikramrajkumar

We're using a structurestd::map<market_history_key(quote_id, base_id, granularity, start_time), value> to store market_price_history data, maybe it's the reason why the blockchain_market_price_history API is slow? It's said that iterating over large std::map object is slow (in compare with std:vector). Will it help if we change it to somewhat like this structure: std::map<market_history_key(quote_id, base_id, granularity), std::vector<pair<start_time, value>>>? In this way iterating will be much faster, looking up will be not slower if use a binary search algorithm inside the vector, and the cost for storing new data is almost constant.

abitmore avatar Apr 15 '15 10:04 abitmore

A B-tree scales better than a vector and provides similar sequential read performance.

abitmore avatar Apr 15 '15 11:04 abitmore

Some thoughts about blockchain_market_order_history:

  • with current implement, order history is stored in map<uint32_t, vector<market_transaction>>. It works well for blockchain_list_market_transactions API which has a block_num parameter
  • blockchain_market_order_history traverses this map backwards from the end, skips the records which don't match, until reach the start or reach a limit. This behavior kills performance in some cases. For an active market pair, it's fast to get enough records and reach the limit then return; but for a market pair which is not active enough, it will probably walk though the entire map before return.
  • blockchain_market_order_history has these parameters: <quote_symbol> <base_symbol> [skip_count] [limit] [owner]
    • for good performance, at least an index for key pair<quote_symbol, base_symbol> is needed
    • the [skip_count] and [limit] parameters seem to be here for some kind of pagination. For best performance with these parameters, the value of above key should be an array. Since arrays/vectors don't scale well, maybe we need a B-tree. However, most time we query for most recent records only, so a map would be not too bad.
    • the [owner] parameter is interesting. Since value of owner spreads widely, if this parameter is used frequently, a dedicated index for it would help a lot.

Further thoughs:

  • blockchain_market_order_history API has no parameter like start_time or end_time which would be useful in some cases.
  • for memory efficiency, it's best not store everything in memory, but only store the most 'hot' data, for example last 1000 records or last N days, and leave the old data on disk.

abitmore avatar Apr 16 '15 16:04 abitmore

Some of my earlier thoughts are not accurate. The map is a container, which could be implemented using BST (binary search tree, which std::map uses) or other data structure (like B-tree). Fortunately it's easy to change the underlying data structure with current fc library if we need.

However, all these trees don't work well with big [skip_count] parameter, unless there is a children_count field maintained in every tree-node, or a global numeric market_trx_id added to every market transaction and maintain a dedicated index on it.

Can we change this parameter to [end_block_num] or [end_time]? Who're using [skip_count] with non-zero value? @svk31 @valzav

abitmore avatar Apr 17 '15 03:04 abitmore

I'm using 0 for the skip_count on my backend and it's the same in the web_wallet. I don't really see the usefulness of the skip count so I'm all for removing it.

Like I said above querying for dates would be useful but not essential, if the call is fast enough we can easily just filter based on dates on the frontend.

svk31 avatar Apr 17 '15 06:04 svk31

While you're looking at this, could you look into why it takes so long for an order to appear in the order history? There's a delay of more than 20 minutes between the execution time of an order and the time when it finally appears in the results of blockchain_market_order_history.

This can be quite confusing when you've made an order in the wallet but the order does not appear in the order history until much later. It also means that information like low, high and latest price can be badly out of sync, as they're taken from the order history.

svk31 avatar Apr 17 '15 08:04 svk31

On second thought I would prefer to have this command use dates instead of number of transactions for the query. The reason is you can never know how many transactions are in a day, so if you want to get the high and low for the last 24 hours you need to fetch a lot of transactions, hoping that you've got enough. For batch operations on low liquidity markets this means you'll probably be fetching order histories going way further than the 24 hours you're interested in.

svk31 avatar Apr 18 '15 07:04 svk31

Pull request submitted, please review. The blockchain_market_order_history API has been rewritten. Test needed. @svk31 I prefer leave the behavior of this API as is, and add another time-range query API. Didn't have time to see why there is a delay yet.. Since the API has been rewritten I believe there should be no such issue again. I may have a look later. By the way, the low, high and latest pricing issue should have been fixed in #1300.

abitmore avatar Apr 20 '15 00:04 abitmore