node-cassandra-cql icon indicating copy to clipboard operation
node-cassandra-cql copied to clipboard

Maximum query array size of client.executeBatch

Open webcc opened this issue 10 years ago • 7 comments

Hi,

We would like to know the maximum query array size that is possible to send to client.executeBatch(). We think it is a good idea to document that because we are facing problems with sizes bigger than 4000 +.

We are addressing that for the moment by splicing the array.

webcc avatar Mar 08 '14 11:03 webcc

As far as I know, there is no limit on batches at protocol level or CQL.

The INSERT / UPDATE queries are for the same partition key? If not, I don't think it would be a good idea to batch atomic queries for large amount of partitions...

Another consideration is that, to batch a large amount of queries, you create in memory a large number of queries and parameters. Also, you send all that data through the wire "serially"...

jorgebay avatar Mar 10 '14 11:03 jorgebay

There seems to be a limit. To give you an idea of the issue, we are sending for a given primary key around 150,000 INSERTs. That generates an exception in FrameWriter. If we splice the array of queries in chunks of 5000 items or less, the problem disappears.

Could you tell us what their options in queryFlag do? In particular, we would like to know what the property pageSize does. That seems to have influence with the performance of the queries if we put that in the configuration of the driver.

And by the way, many thanks for this excellent piece of software.

webcc avatar Mar 12 '14 18:03 webcc

Thanks!

queryFlag is not affecting the batch in any way, pageSize is used by Cassandra only for select queries and it is ignored for others.

I still think it is not a good idea to batch such a large amount of queries, consider that each query should take on average 50 bytes (depending on the size of the query and parameters) multiplied by 150,000 queries is more than 7Mb of data on memory (that then is transfered over the wire). Is there a reason to do such large operations?

Also, if possible, use non atomic batches (atomic batches have a performance impact):

client.executeBatch(queries, consistency, {atomic: false}, callback);

If you are getting an error from the FrameWriter, please post it.

jorgebay avatar Mar 12 '14 21:03 jorgebay

Hi Jorge,

Could you provide us with a short comment on what the option {atomic: false} (versus {atomic: true}) exactly does?

darthcav avatar Mar 26 '14 13:03 darthcav

Its atomic in database terms: if any part of the batch succeeds, all of it will.

More info: Atomic batches in Cassandra

jorgebay avatar Mar 26 '14 15:03 jorgebay

I'm seeing the same thing. It took me a while to track it down because only a few of my INSERTS are failing with exception TypeError: value is out of bounds. At first I thought it was due to incorrect type coersion of very long IDs (Twitter ID's, 64-bit ints that I'm storing as a string).

The problem stems from the following code:

FrameWriter.prototype.writeShort = function(num) {
  var buf = new Buffer(2);
  buf.writeUInt16BE(num, 0);
  this.buffers.push(buf);
};

The parameter num in one of my failure cases, for example, is 197136. I looked up writeUInt16BE in the Node docs, but some simple math tells me that 197136 is way outside of the 2^16 possible.

Now, with no knowledge of the underlying Cassandra wire protocol, my question is: is it possible to step up this value to perhaps 2^32? I realize that batches of this size are probably recommended against, but for these particular transactions, I need them to be that big to remain atomic. This particular insert is around 12MB uncompressed as JSON.

dsimmons avatar Aug 21 '14 17:08 dsimmons

@jorgebay - Wouldn't it be more descriptive to say that if any part of the batch fails, the entire batch fails? I suppose the two are equivalent, but typically "what happens when something fails?" is the main concern.

adam-roth avatar Jan 13 '16 06:01 adam-roth