bitcore-node icon indicating copy to clipboard operation
bitcore-node copied to clipboard

Bitcoin JSON-RPC: Work queue depth exceeded' while reindexing

Open levino opened this issue 9 years ago • 39 comments

Getting these errors occasionally. Nothing to worry about?

[2016-06-27T15:30:22.756Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
    at Bitcoin._wrapRPCError (/usr/src/app/node_modules/bitcore-node/lib/services/bitcoind.js:449:13)
    at /usr/src/app/node_modules/bitcore-node/lib/services/bitcoind.js:567:26
    at IncomingMessage.<anonymous> (/usr/src/app/node_modules/bitcore-node/node_modules/bitcoind-rpc/lib/index.js:91:9)
    at emitNone (events.js:72:20)
    at IncomingMessage.emit (events.js:166:7)
    at endReadableNT (_stream_readable.js:913:12)
    at nextTickCallbackWith2Args (node.js:442:9)
    at process._tickCallback (node.js:356:17)

levino avatar Jun 27 '16 15:06 levino

I was able to reproduce. The issue is, that I do something like this.



BitcoreNotifier.prototype.transactionHandler = function (transactionBuffer) {
  var self = this;
  var tx = new Transaction().fromBuffer(transactionBuffer);
  log.trace({
    txid: tx.hash
  }, 'Handling transaction');
  this.node.services.bitcoind.getDetailedTransaction(tx.hash, function(err, transaction) {
    if (err) {
      return log.error({
        txid: tx.hash
      }, 'Error getting detailed Transaction from Bitcore node.');
    }
    log.debug({
      txid: transaction.txid
    }, 'Got insight api transformed tx');
    self.webhooks.forEach(function (webhook) {
      processWebhook(transaction, webhook, function (err) {
        if (err) {
          log.error({
            error: err
          }, 'Failed to process webhook');
        }
      });
    });
  });
};

// Interface

BitcoreNotifier.prototype.start = function (callback) {
  this.node.services.bitcoind.on('tx', this.transactionHandler.bind(this));
  log.info('BitcoreNotifier started.');
  setImmediate(callback);
};

The idea is to send every transaction to another service. The problem is, that when a new block comes in, the event fires for all transactions at once. So the function

node.services.bitcoind.getDetailedTransaction

is called many times in parallel (something around 3k times), which causes BitcoinD to refuse to take any more jobs over the rpc interface.

There should be some queuing in place to prevent this. So far I consider it a bug.

levino avatar Jun 27 '16 15:06 levino

You may need to use async.eachSeries, async.mapLimit, or use some sort of queue. You can also increase the workqueue limit in bitcoin.conf.

braydonf avatar Jun 27 '16 16:06 braydonf

That might mitigate, but I still have no control of how many queries are coming at the same time through insight-api or some other service...

levino avatar Jun 27 '16 19:06 levino

You can add more bitcoind processes to handle the additional load if necessary: https://github.com/bitpay/bitcore-node/blob/master/docs/services/bitcoind.md#configuration

braydonf avatar Jun 27 '16 19:06 braydonf

Also, here is the configuration option for the work queue limit:

rpcworkqueue=<n>

Sets the depth of the work queue to service RPC calls, the default is 16.

braydonf avatar Jun 27 '16 19:06 braydonf

Thank you, @braydonf. You can close this or keep it open as a feature request. Would appreciate a global queuing. Btw. I will try async.queue for anyone doing the same. Much easier in this context I assume.

levino avatar Jun 27 '16 20:06 levino

I'm getting this during normal reindex.

Not doing anything special, just running bitcored, reindexing, and then I get this after 3% indexed

dabura667 avatar Jul 05 '16 01:07 dabura667

Really? Is it for testnet or livenet? I used to see this, however it shouldn't be an issue now that zmq events are subscribed to after 99.99% synced. There can be many rapid tip updates resulting in many RPC calls, and while syncing it should poll at a slower interval for updates.

braydonf avatar Jul 05 '16 01:07 braydonf

livenet

Hmm... I tried increased dbcache to 1024

I will try to set rpcworkqueue to like 32 for now

dabura667 avatar Jul 05 '16 02:07 dabura667

I was fully synched until a few hours ago, when I started reindex but then it crashed and now I am restarting reindex...

Can I resume index or do I need to restart the reindex every crash?

dabura667 avatar Jul 05 '16 02:07 dabura667

You should be able to resume without restarting the reindex.

braydonf avatar Jul 05 '16 02:07 braydonf

Another option, you can run bitcoind standalone (./bin/bitcoind) while reindexing.

braydonf avatar Jul 05 '16 02:07 braydonf

Resuming without reindex caused all the services to start and it keeps saying I found the same block over and over on livenet...

I'm going to reindex using 64 rpcqueue

dabura667 avatar Jul 05 '16 04:07 dabura667

Something else: As far as I understand queries to the API now immediatly trigger queries to bitcoind (rpc calls). Like "additional transaction information" on the /tx endpoint. Does this scale? What about 10k queries hitting the server at the same time? Are there any load tests run in ci?

levino avatar Jul 07 '16 19:07 levino

Regarding getting this error while reindexing, I've run into this also. I think we may be able to mitigate by making sure to delay polling if we've not yet received a reply.

There are a few load tests around receiving many transactions into the mempool and emitting those events.

braydonf avatar Aug 12 '16 14:08 braydonf

I just made an issue here: https://github.com/bitpay/insight-api/issues/492

Getting the error from the insight api just by running multiple requests in parallel. As the current implementation is passing each query on to the bitcoind rpc interface this is exactly what should happen. All queries must be queued globally.

levino avatar Aug 12 '16 14:08 levino

Running multiple nodes, and internal node, is the means to handle more requests.

braydonf avatar Aug 12 '16 14:08 braydonf

What do you mean by this? Multiple bitcoind nodes? I can overload your deployment on https://insight.bitpay.com/api with just 200 queries in parallel...

And by overload I mean that others will probably get the work queue error too, even if they send a single request at that time.

levino avatar Aug 12 '16 14:08 levino

I get this error too. I get the Work queue depth exceeded Error a lot of times. It's not while reindexing, it's with regular use. Are there any solutions?

elichai avatar Aug 20 '16 18:08 elichai

Go back to older insight api version (and old bitcore node) or query less and / or slower and repeat failed requests.

levino avatar Aug 21 '16 05:08 levino

I'm also getting this error when syncing a new insight installation.

This is happening with very few API requests. I'm the only person accessing the insight node, and all I do is refresh the /insight/status page periodically to watch the sync percentage climb.

edit: It happened just now without ANY requests to the insight webserver. It's failing from the sync itself. Here's the output when it started failing:

info New livenet block:  000000000000000001637cdd2af21dd2acd41488266fd890943b86ce12a2ac0a
[2016-12-24T00:56:52.389Z] info: Bitcoin Height: 439180 Percentage: 98.44
info New livenet block:  0000000000000000033a21bc3e4c789c746e5890100a8c4a34779584394ffa0a
[2016-12-24T00:57:13.412Z] info: Bitcoin Height: 439209 Percentage: 98.45
info New livenet block:  000000000000000001f42c4371f4d1d0de46dd157947d4a97df547ffdb4b1f8f
[2016-12-24T00:57:22.699Z] info: Bitcoin Height: 439235 Percentage: 98.45
info New livenet block:  000000000000000000ec0bde7f711b7d39749b29463795b964aa6a1f2b359699
[2016-12-24T00:57:37.850Z] info: Bitcoin Height: 439261 Percentage: 98.46
info New livenet block:  0000000000000000012ea2dd8035eb500d64ee759d3f7ee6e3b61b3878925bf1
[2016-12-24T00:57:54.115Z] info: Bitcoin Height: 439285 Percentage: 98.47
[2016-12-24T01:04:32.856Z] info: Bitcoin Height: 439756 Percentage: 98.61
info New testnet block:  0000000000000292ee9cf7ef4375ba3661cbe396b247779155815d7a729a5791
[2016-12-24T01:10:07.522Z] info: Bitcoin Height: 440222 Percentage: 98.75
info New livenet block:  000000000000000003736b0f5dd20ed8b01159a4679cfde2d107c6d1e431cc41
[2016-12-24T01:10:27.186Z] info: Bitcoin Height: 440240 Percentage: 98.76
[2016-12-24T01:15:35.044Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:15:50.044Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:16:05.045Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:16:20.046Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:16:35.047Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:16:50.053Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:17:05.050Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:17:20.051Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:17:35.052Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:17:50.053Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:18:05.054Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:18:20.054Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:18:35.056Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:18:50.057Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:19:05.061Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:19:20.060Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:19:35.061Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:19:50.062Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:20:05.063Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:20:20.064Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:20:35.065Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:20:50.066Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:21:05.068Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:21:20.069Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:21:35.069Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:21:50.071Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:22:05.072Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:22:20.072Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:22:35.073Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:22:50.073Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:23:05.075Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded
[2016-12-24T01:23:20.076Z] error: RPCError: Bitcoin JSON-RPC: Work queue depth exceeded

toomim avatar Dec 24 '16 01:12 toomim

Which older version should we revert to?

toomim avatar Dec 24 '16 01:12 toomim

FWIW, you can run bitcoind alone while syncing, and you will not have any issues.

braydonf avatar Dec 24 '16 01:12 braydonf

What is possibly happening is that bitcore-node is starting to listen to tx events too early and is getting flooded with new transaction events from transactions within blocks. You can increase the workqueue depth in bitcoin.conf (rpcworkqueue=<n>), but that likely won't help here.

braydonf avatar Dec 24 '16 01:12 braydonf

I also get this error. I consider it a logic error when the data access strategy leads to non-deterministic behavior on a heavily-loaded server. This is exacerbated by LevelDB which was designed for dedicated servers, which is not a great choice when used as an embedded DB and strapped to a monolith like Bitcore+plugins.

drauschenbach avatar Feb 04 '17 14:02 drauschenbach

I see this error frequently while performing a fresh sync of livenet over bitecored with insight-api and insight-web enabled but no clients connected.

CaptEmulation avatar Mar 14 '17 20:03 CaptEmulation

Hello. We have this error frequently (on mytrezor). I want to fix this by using a queue of requests instead of doing all requests at the same time.

I am not sure whether to fix it in bitcore-node in /lib/services/bitcoind, or "deeper" in repo bitpay/bitcoind-rpc. And it it's better to use async (which you use already in here); or "queue" from npm (which is easier for me to use :))

It seems to me it will be "easier" in bitcoind-rpc, but whatever will be merged faster :)

karelbilek avatar Mar 20 '17 12:03 karelbilek

You also need spam protection at the web request level, probably as some express middleware in order to prevent individuals from flooding the queue. Something like "only 20 requests per second".

levino avatar Mar 20 '17 13:03 levino

I have added queue at the rpc level as a PR. On our end it fixes the errors so far. If the spamming is a problem, I can add it on the other end as well, but it's a bit harder.

https://github.com/bitpay/bitcoind-rpc/pull/23

karelbilek avatar Mar 22 '17 23:03 karelbilek

I'm getting this error just trying to sync the livenet.

After doing ./bin/bitcoind i get

Error: Error loading wallet.dat: Wallet requires newer version of Bitcoin Core

@runn1ng any idea how I can get Bitcore to use your branch instead?

Sexual avatar May 14 '17 12:05 Sexual