status-go icon indicating copy to clipboard operation
status-go copied to clipboard

Usage of transfer history fetch

Open alaibe opened this issue 2 years ago • 41 comments

Problem

Calculate the usage of full transfer history fetch

If acceptable for mobile, the on demand strategy can removed

alaibe avatar Aug 01 '23 09:08 alaibe

fetching tx history for one of my accounts (which has 550+ tx per etherscan (that excludes incoming ERC20 transfers in most of cases) and something around 50 incoming ERC20) takes 65-85MB of data and more than 7k requests

what is a bit suspicious (although I don't know in details how that works) is that each time I run it i get a different number of eth_TransactionReceipt and eth_FullTransactionByBlockAndIndex (maybe @IvanBelyakoff knows whether that's expected)

rasom avatar Aug 08 '23 12:08 rasom

@rasom How different are they? Are you using rpcstats.CountCall for that? Did you remove the data from DB each time? Because we don't detele the data if account is removed.

The only thing that comes to my mind irrespective of how this is tested and measured is that every time it is measured new blocks have been generated and history checked intervals are different which may result in slight variations of calls to find the blocks that have ETH transfers. For ERC20 I think that the difference can be even smaller if there is.

IvanBelyakoff avatar Aug 08 '23 13:08 IvanBelyakoff

@IvanBelyakoff

How different are they?

so I logged three runs so far, I have 367/221, 422/282, 306/247 for eth_TransactionReceipt/eth_FullTransactionByBlockAndIndex

Are you using rpcstats.CountCall for that?

yes

Did you remove the data from DB each time?

I do remove entire multiacc (status account, not wallet account) and from data usage point of view it looks like the database is also removed (or not used after account recovery via phrase), otherwise it would be hard to explain 65-85MB of traffic.

history checked intervals are different which may result in slight variations of calls to find the blocks that have ETH transfers

Yes I'm aware of it, that's why I wrote specifically about calls which counts shouldn't vary per my understanding.

rasom avatar Aug 08 '23 13:08 rasom

I also just noticed that for that run which had 306/247 there were only 264 eth_BlockByNumber which is likely not enough to scan entire history of that wallet account (500+ transfer and most of them in different blocks), so it looks like that scanning stalls on mobile at some point. I have to figure out what causes this and then rerun tests.

rasom avatar Aug 08 '23 13:08 rasom

sooo, it stops loading historical data after the first "check recent blocks" request like

t=2023-08-08T14:03:54+0000 lvl=info msg="wallet historical downloader for erc20 transfers start" chainID=1 address=0x2f88d65f3cB52605a54A833ae118fb1363aCccd2 from=17,8 70,678 to=17,870,678 t=2023-08-08T14:03:54+0000 lvl=info msg="iterative downloader" address=0x2..................2 from=17,870,678 to=17,870,678 size=500,000 t=2023-08-08T14:03:54+0000 lvl=info msg="wallet historical downloader for erc20 transfers finished" chainID=1 address=0x2f88d65f3cB52605a54A833ae118fb1363aCccd2 from=1 7,870,678 to=17,870,678 time="378.692µs" headers=0

rasom avatar Aug 08 '23 14:08 rasom

Yes I'm aware of it, that's why I wrote specifically about calls which counts shouldn't vary per my understanding

Yes, you are right, missed it.

sooo, it stops loading historical data after the first "check recent blocks" request like

But there is only one block to check (to == from). Are you sure this is not just because it was checked. There are 2 concurrent commands going on - checking for history and checking the newest block. From what I see, this is a newest block checked

IvanBelyakoff avatar Aug 08 '23 14:08 IvanBelyakoff

I also just noticed that for that run which had 306/247 there were only 264 eth_BlockByNumber which is likely not enough to scan entire history of that wallet account (500+ transfer and most of them in different blocks), so it looks like that scanning stalls on mobile at some point. I have to figure out what causes this and then rerun tests.

Roman sorry to hear that. I just realized I never got back this code to write some tests for the new sequential strategy. So that could possibly be a bug if it does not load all the transfers occasionally. I might need to write some tests, especially since you found out that it possibly does not load all the transfers

IvanBelyakoff avatar Aug 08 '23 14:08 IvanBelyakoff

@IvanBelyakoff I'm still not 100% that this is not due to the way how API is used on mobile side

Do you have some document or maybe you can describe what is the order of calls in order to fetch that entire history? Like I just call wallet_checkRecentHistory, or I start with wallet_start, the do wallet_checkRecentHistory and wait, or what exactly should be done?

rasom avatar Aug 08 '23 14:08 rasom

You don't have control over it for this strategy. Once you add the account, downloader should fetch all the history (well, for now it is not true, it will fetch to the block when the first eth transfer happened, and stop even for ERC20 - currently by design, to save calls). Once new transfers loaded

  • new-transfer event is sent and UI should react and fetch the transfers based on filters set or just all in chunks.. some "getTransferByAddress" should do it - just fetches from db.

Why don't you measure this using desktop app or some test sample?

On Tue, Aug 8, 2023, 16:55 Roman Volosovskyi @.***> wrote:

@IvanBelyakoff https://github.com/IvanBelyakoff I'm still not 100% that this is not due to the way how API is used on mobile side

Do you have some document or maybe you can describe what is the order of calls in order to fetch that entire history? Like I just call wallet-checkRecentHistory, or I start with wallet_start, the do wallet-checkRecentHistory and wait, or what exactly should be done?

— Reply to this email directly, view it on GitHub https://github.com/status-im/status-go/issues/3833#issuecomment-1669780446, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXHGVJIZ2NOEC24XT4A6TTXUJHOHANCNFSM6AAAAAA27PFRPU . You are receiving this because you were mentioned.Message ID: @.***>

IvanBelyakoff avatar Aug 08 '23 20:08 IvanBelyakoff

@IvanBelyakoff so to clarify

  1. I was told that on desktop we do fetch entire history right away (that to be the difference with an old version)
  2. I was told to check whether this will be suitable for mobile

Why don't you measure this using desktop app or some test sample?

Because I'm more comfortable with mobile version atm and it shouldn't matter. Why is it better to test it on desktop?

You don't have control over it for this strategy.

What I see is that entire history is not loaded and it took 25 minutes to get to block 5,885,783 (on WIFI). That required 14k+ requests. At the moment I'm a bit puzzled because this algorithm neither actively tries to save RPC calls (because 14k+) nor loads entire history. So what's the point here? Or there were some recent changes and I have to update local version of status-go where it works differently?

t=2023-08-09T07:29:00+0000 lvl=info msg="logging initialised" logSettings="{\"Enabled\":true,\"MobileSystem\":false,\"Level\":\"INFO\",\"File\":\"\\/storage\\/emulated\\/0\\/Android\\/data\\/im.status.ethereum.debug\\/files\\/Download\\/geth.log\"}"

t=2023-08-09T07:54:47+0000 lvl=info msg="eth historical downloader finished successfully" chain=1 address=0x2...2 from=5,885,783 to=5,985,783 total blocks=13 time=2.666043385s

rasom avatar Aug 09 '23 08:08 rasom

@IvanBelyakoff do you still call wallet_checkRecentHistory on desktop?

rasom avatar Aug 10 '23 12:08 rasom

So what I do:

  1. restore an existing account via phrase with WalletConfig.LoadAllTransfers = true to enable that sequential strategy
  2. call "wallet_getRecentHistory"
  3. on "recent-history-ready" call "wallet_getTransfersByAddress"

What I get:

  1. some part of history is fetched right away. It takes 3 to 20+ minutes depending on how lucky I am, and it takes 60-120MB of traffic, 7k to 15k eth_ requests
  2. it stops somewhere around block 5,885,760
t=2023-08-10T17:06:28+0000 lvl=info msg="wallet historical downloader for erc20 transfers finished" chainID=1 address=0x2f88d65f3cB52605a54A833ae118fb1363aCccd2 from=5,885,760 to=5,985,760 time=460.406875ms headers=7
  1. On restart it never catches up with older blocks

This is definitely not optimal for a mobile network. Arguably it will be costly in terms of RPC calls eventually, but that depends on number of users we will have.

@IvanBelyakoff would be nice if you could confirm that what I do on the client side is actually expected to be done this way.

rasom avatar Aug 10 '23 17:08 rasom

And as a note: we had to execute significantly less RPC requests in order to get to the same 5,885,760 block in the older version of the app, it is something around 6k agains 13k (for those specific runs). At least it looks so per RPC usage screen.

old: Screenshot 2023-08-10 at 19 29 51

new: Screenshot 2023-08-10 at 19 29 51

rasom avatar Aug 10 '23 17:08 rasom

hm maybe it is just some issue with wallet_getTransfersByAddress, seems that transfers are loaded but I don't get them from wallet_getTransfersByAddress

upd.: yep I can confirm that it was rather pagination issue and mobile app would stop loading older transactions to the client side, likely there were some changes on backend related to this

rasom avatar Aug 10 '23 17:08 rasom

Hey @rasom I was trying to reach out a couple of days ago via Status app, but you probably never received my message, and I could not find you on Discord that day, then it slipped my mind sorry.

@IvanBelyakoff do you still call wallet_checkRecentHistory on desktop?

I've checked the code and believe we still do. At least I don't see that something wrong happening if you do that

2. it stops somewhere around block 5,885,760

I believe this is the beginning of ETH history for the account

3. On restart it never catches up with older blocks

If that is indeed the beginning of ETH history, then it wont proceed loading older blocks on restart even for ERC20. We know this is strictly speaking wrong, but on some chains there are 100+ million blocks and with this strategy it is very consuming as for requests to check all that history even in chunks of multiple thousands requests per call (getLogs), and considering that there are several chains, in this sense we do save some calls for now, because in more cases there wont be any ERC20 transfers than will. I've discussed that with @alaibe and @John-44 a while ago and we decided we just stop there for the time being.

So what I do:

  1. restore an existing account via phrase with WalletConfig.LoadAllTransfers = true to enable that sequential strategy
  2. call "wallet_getRecentHistory"
  3. on "recent-history-ready" call "wallet_getTransfersByAddress"

Though I think we don't call getTransfersByAddress now, but we use filtering in activity.go to display transfers, I think the right call we use is getMultiTransactions but for this purpose of measuring, it does not matter. getMultiTransactions will give you even smaller number of transfers of some of them are part of single transactions.

And as a note: we had to execute significantly less RPC requests in order to get to the same 5,885,760 block in the older version of the app, it is something around 6k agains 13k (for those specific runs). At least it looks so per RPC usage screen.

old: Screenshot 2023-08-10 at 19 29 51

new: Screenshot 2023-08-10 at 19 29 51

As for the test results, I see that for some reason NonceAt is missing in the old strategy, and there are some calls like FullTransactionByBlockAndNumber which fixes an issue with wrong block hash fetched from header (if I remember correctly) and GetBaseFeeFromBlock - I have no idea now what we use it for, actually, that together make 2 requests more, but I just think that those calls might have not been included in the old strategy and are used for a reason, sothe numbers after the fixes could be more even between new and old.

Do you know why you don't have "NonceAt" call in the list? I remember it should be called to check for transfers within block range

IvanBelyakoff avatar Aug 11 '23 08:08 IvanBelyakoff

Hey @rasom I was trying to reach out a couple of days ago via Status app, but you probably never received my message, and I could not find you on Discord that day, then it slipped my mind sorry.

@IvanBelyakoff do you still call wallet_checkRecentHistory on desktop?

I've checked the code and believe we still do. At least I don't see that something wrong happening if you do that

  1. it stops somewhere around block 5,885,760

I believe this is the beginning of ETH history for the account

  1. On restart it never catches up with older blocks

If that is indeed the beginning of ETH history, then it wont proceed loading older blocks on restart even for ERC20. We know this is strictly speaking wrong, but on some chains there are 100+ million blocks and with this strategy it is very consuming as for requests to check all that history even in chunks of multiple thousands requests per call (getLogs), and considering that there are several chains, in this sense we do save some calls for now, because in more cases there wont be any ERC20 transfers than will. I've discussed that with @alaibe and @John-44 a while ago and we decided we just stop there for the time being.

So what I do:

  1. restore an existing account via phrase with WalletConfig.LoadAllTransfers = true to enable that sequential strategy
  2. call "wallet_getRecentHistory"
  3. on "recent-history-ready" call "wallet_getTransfersByAddress"

Though I think we don't call getTransfersByAddress now, but we use filtering in activity.go to display transfers, I think the right call we use is getMultiTransactions but for this purpose of measuring, it does not matter. getMultiTransactions will give you even smaller number of transfers of some of them are part of single transactions.

And as a note: we had to execute significantly less RPC requests in order to get to the same 5,885,760 block in the older version of the app, it is something around 6k agains 13k (for those specific runs). At least it looks so per RPC usage screen.

As for the test results, I see that for some reason NonceAt is missing in the old strategy, and there are some calls like FullTransactionByBlockAndNumber which fixes an issue with wrong block hash fetched from header (if I remember correctly) and GetBaseFeeFromBlock - I have no idea now what we use it for, actually, that together make 2 requests more, but I just think that those calls might have not been included in the old strategy and are used for a reason, sothe numbers after the fixes could be more even between new and old.

Do you know why you don't have "NonceAt" call in the list? I remember it should be called to check for transfers within block range

IvanBelyakoff avatar Aug 11 '23 08:08 IvanBelyakoff

@IvanBelyakoff

Do you know why you don't have "NonceAt" call in the list? I remember it should be called to check for transfers within block range

on that screen it is eth_getTransactionCount

rasom avatar Aug 14 '23 11:08 rasom

https://notes.status.im/TSPntaw0SCeV44Z4ENM1sQ?view

rasom avatar Aug 16 '23 16:08 rasom

https://notes.status.im/TSPntaw0SCeV44Z4ENM1sQ?view

@rasom Thanks for the notes!

@alaibe @rasom @IvanBelyakoff I think the one thing that we DON'T want to do is implement any difference in behaviour when a user is using cellular data. These days, in many locations cellular data is fast and plentiful. Also in many locations wifi is slow. So saying that "cellular data = slow and expensive data", and "wifi data = fast and plentiful data" is incorrect. Yes it is sometimes true, but equally a lot of the time it's not true.

In the future we will be introducing to Status Desktop a bandwidth test on app startup, and we will use the results from this test to configure the various node services for the duration of that session. I haden't considered doing the same on mobile, and it's probably not needed on mobile, but if we really want to make decisions based on a user's available bandwidth the only way to do this in a sufficiently reliable fashion is to actually periodically run a test to check the user's bandwidth.

So we shouldn't do either possible actions 1 or 2 in the screenshot from the notes below:

2023-08-17_15-02-16

We need to think of possible solutions that don't factor if a user is connected to wifi or cellular into the decision. Also Status Desktop users can have low bandwidth as well! So the ideal solution works for both Status Desktop and Status Mobile users e.g. we should try do avoid doing anything Mobile specific here.

John-44 avatar Aug 17 '23 14:08 John-44

@John-44 thanks

as I mentioned in the note, the way how we can handle this for transfers history is to have a limit of around 5k RPC requests (arbitrary and it will never be very precise, also the way how we detect that threshold might differ), and stop fetching when that limit is reached. We can also add a timeframe... or just go with an old argo which made an attempt to make as few as possible RPC requests to fetch at least 20 latest transfers (if those exist). But that's technical details.

The point is that we can't guarantee that loading of entire history will go smoothly for all accounts. Thus not all accounts will get a complete history right away and we can't completely get rid of some kind of "on demand strategy". The objective of this issue was to figure out whether we can just ditch it ("on demand strategy").

rasom avatar Aug 17 '23 14:08 rasom

@rasom it looks like we cannot just ditch it. But can we hope with all the improvement suggested in the notes to have only one algorightm?

alaibe avatar Aug 17 '23 16:08 alaibe

@alaibe, do you mean one instead of "sequential" and "on demand"

As far as I can say the algorithm is identical in both cases (I mean the way how we find transfers in the range of blocks, @IvanBelyakoff please correct me if I'm wrong), the difference is what range is checked and when.

rasom avatar Aug 17 '23 17:08 rasom

As far as I can say the algorithm is identical in both cases (I mean the way how we find transfers in the range of blocks, @IvanBelyakoff please correct me if I'm wrong), the difference is what range is checked and when.

@rasom you are correct.

@alaibe Do you mean the actions that @rasom suggested - combine new transfers fetching with new balance history and history fetching with hitory balance?

Or anything else in terms of when/what range we check - like do some 5K requests then on demand? I just think that the activity for Desktop that we have developed will often have some banner in this case like "not all history is fetched, data may not be accurate" if we don't fetch all the history data. I don't think that was the goal, though I understand the limitations we have

IvanBelyakoff avatar Aug 18 '23 11:08 IvanBelyakoff

@IvanBelyakoff

I just think that the activity for Desktop that we have developed will often have some banner in this case like "not all history is fetched, data may not be accurate"

Just as a reminder: majority of account will not have history which will actually require around 5k requests, thus it is very unlikely that it will be seen often. Watch-only accounts is a different thing, but I'm not sure that we have to replace block explorers for those.

rasom avatar Aug 18 '23 14:08 rasom

@rasom on one hand I do agree with you. On the other, currently we have a "workaround" that we don't check ERC20 further back than the block with first ETH transfer. This is done because we must use some reasonable block range for getLogs call - sometimes this is a limitation of providers like POKT and Infura, but also I think if we take too big ranges, it is split into subranges on each iteration recursively and calls getLogs for that subrange and this can be quite resource hungry - both CPU and mem intensive and cause reaching requests limit per second in hystrix (though we could increase it, but I don't know if that is a good idea) and at provider as well. So we will need to check that with relatively small chunks - several thousand blocks a time and less - if we want to keep searching. AFAIK this workaround is somewhat temporary solution, because some chains have a hundred of millions blocks, so it is very long to check before we can say that all history is loaded. So now when we say that all history is checked, technically, this is not true. And in fact we should notify the users

IvanBelyakoff avatar Aug 18 '23 14:08 IvanBelyakoff

@IvanBelyakoff

split into subranges on each iteration recursively and calls getLogs for that subrange in old "on demand" startegy we would

  1. try to find the range with at least 20 outgoing transfers
  2. fetch all ETH transfers in that range
  3. Check all logs by making as little as possible getLogs calls (considering we know the limit)

So I mean, I don't really think that there is a point in making getLogs calls recursively, we need to determine the range and just go with it.

As per that 5k limit we can also try to estimate a range for each we will keep it under 5k or whatever we will choose. It will never be very precise, but still having ~50-100 transactions fetched in most cases should be more than enough.

rasom avatar Aug 18 '23 14:08 rasom

@IvanBelyakoff

split into subranges on each iteration recursively and calls getLogs for that subrange in old "on demand" startegy we would

  1. try to find the range with at least 20 outgoing transfers
  2. fetch all ETH transfers in that range
  3. Check all logs by making as little as possible getLogs calls (considering we know the limit)

So I mean, I don't really think that there is a point in making getLogs calls recursively, we need to determine the range and just go with it.

As per that 5k limit we can also try to estimate a range for each we will keep it under 5k or whatever we will choose. It will never be very precise, but still having ~50-100 transactions fetched in most cases should be more than enough.

you are talking about 20 ETH transfers right in p.1? I am talking about a situation, where we have already found the first ever block for ETH transfers (nonce is 0, balance is 0). So before that block we can't determine the range you are talking about. But there can still be incoming ERC20 transfers. Do I understand it correctly or I miss something?

IvanBelyakoff avatar Aug 18 '23 15:08 IvanBelyakoff

@IvanBelyakoff oh ok, so the simplest thing we can do for this case:

  1. At that point we know which tokens were used by that account
  2. We can check whether those had zero balance before the block with the first ETH transfer
  3. If we find that some of them are not zero we can do binary search for those specific tokens (by balance, as those are going to be only incoming txs it should be straightforward)

Then we can do getLogs for specific blocks, or well just get those blocks without getLogs. It likely will take less requests than getting smaller chunks of getLogs.

upd. if I recall correctly, we have to execute a single RPC request to get balances of all ERC20 tokens (that can be our own list of top something tokens + those which were found during history fetching) and that in most cases will be almost as good as all those getLogs (at least for verifying that there were no transfers)

rasom avatar Aug 18 '23 15:08 rasom

So that's what I get now while fetching full history on that same account, apparently nothing changed here because previously balance history requests were not counted

;; => {:total 13596,
;;     :methods
;;     ([:eth_BalanceAt 5277]
;;      [:eth_NonceAt 4323]
;;      [:eth_GetBaseFeeFromBlock 805]
;;      [:eth_TransactionReceipt 676]
;;      [:eth_HeaderByNumber 573]
;;      [:eth_BlockByNumber 556]
;;      [:eth_FullTransactionByBlockNumberAndIndex 526]
;;      [:eth_FilterLogs 342]
;;      [:eth_CallContract 235]
;;      [:eth_BlockByHash 120]
;;      [:eth_TransactionByHash 120]
;;      [:eth_CodeAt 36]
;;      [:eth_BlockNumber 3]
;;      [:eth_CallContext 2]
;;      [:eth_getBlockByNumber 1]
;;      [:eth_getBalance 1])}

that's the old screen

So unless we run old version of balance history with addition of that counting it is hard to say what's the difference, it just wasn't included.

rasom avatar Oct 23 '23 20:10 rasom

That's one hour of idle load with only one network

;; => {:total 1883,
;;     :methods
;;     ([:eth_FilterLogs 556]
;;      [:eth_CallContract 453]
;;      [:eth_HeaderByNumber 282]
;;      [:eth_BalanceAt 278]
;;      [:eth_NonceAt 278]
;;      [:eth_CodeAt 18]
;;      [:eth_BlockNumber 18])}

rasom avatar Oct 23 '23 21:10 rasom