BanksClient hangs when processing multiple transactions
This seems like a very easy problem to encounter so maybe I've missed something, but I couldn't find an existing issue for it.
Problem
If you call banks_client.process_transactions with more than 1 transaction there is a non-zero probability that it hangs for 60 seconds then exits. That probability seems to be >90% even with just 4 transactions. If you try say 10 transactions it seems to always hang.
There's a full demo here.
Here are the logs when sending 10 transactions:
[2023-02-25T12:00:18.103207144Z DEBUG solana_runtime::message_processor::stable_log] Program 11111111111111111111111111111111 invoke [1]
[2023-02-25T12:00:18.103303874Z TRACE solana_runtime::system_instruction_processor] process_instruction: Transfer { lamports: 1000000 }
[2023-02-25T12:00:18.103365823Z DEBUG solana_runtime::message_processor::stable_log] Program 11111111111111111111111111111111 success
[2023-02-25T12:00:18.104895976Z DEBUG solana_runtime::message_processor::stable_log] Program 11111111111111111111111111111111 invoke [1]
[2023-02-25T12:00:18.104954713Z TRACE solana_runtime::system_instruction_processor] process_instruction: Transfer { lamports: 1000001 }
[2023-02-25T12:00:18.104982370Z DEBUG solana_runtime::message_processor::stable_log] Program 11111111111111111111111111111111 success
[2023-02-25T12:00:18.106394770Z DEBUG solana_runtime::message_processor::stable_log] Program 11111111111111111111111111111111 invoke [1]
[2023-02-25T12:00:18.106507075Z TRACE solana_runtime::system_instruction_processor] process_instruction: Transfer { lamports: 1000003 }
[2023-02-25T12:00:18.106556313Z DEBUG solana_runtime::message_processor::stable_log] Program 11111111111111111111111111111111 success
test test_process_transactions has been running for over 60 seconds
I've noticed it usually only processes 3 or 4 txs before hanging.
I have also tried using the singular process_transaction several times asynchronously and encountered the same problem.
I tried the following older versions locally and got the same hanging behaviour:
- 1.13.6
- 1.11.10
- 1.9.29
- 1.7.17
Some more observations: when the test hangs with two transactions, the fee_collection_results in load_execute_and_commit_transactions look like [Ok(()), Err(AccountInUse)]. The part that hangs is the call to self.poll_signature_status in banks_server.process_transaction_with_commitment_and_context. The hang simply happens because bank.get_signature_status_with_blockhash is None for the transaction that failed.
However I don't think this is supposed to happen? If I make it run into a different error, e.g. hitting InsufficientFundsForRent because the transfer amount is too small, then get_signature_status_with_blockhash works and finds the failed transaction so that BanksClient can see the error
Ok, so the AccountInUse error happens in prepare_entry_batch -> .lock_accounts(). This means the transaction doesn't get committed so there's nothing for get_signature_status_with_blockhash to see.
Is there something we could do to make errors in uncommitted transactions visible to BanksClient? Not fun having them go into a black hole