dcrdex icon indicating copy to clipboard operation
dcrdex copied to clipboard

ws: Handle messages that are too large more gracefully.

Open JoeGruffins opened this issue 4 years ago • 12 comments
trafficstars

Sending a message that is too large will cause the ws to disconnect temporarily. It would be better to stop these and error before sending. It looks like from the logs that it is reading, so I guess the server is at fault? Perhaps sending messages in parts if too big is the better answer...

Can reproduce with the simnet harness by selling 200,000 or so dcr.

This is the client.

2021-06-11 11:38:19.610 [ERR] CORE[wss://127.0.0.1:17273/ws]: read error (websocket: close 1009 (message too big)), attempting reconnection
2021-06-11 11:38:19.610 [DBG] CORE: notify: |POKE| (conn) DEX disconnected - DEX at 127.0.0.1:17273 has disconnected
2021-06-11 11:38:19.610 [INF] CORE[wss://127.0.0.1:17273/ws]: Attempting to reconnect to wss://127.0.0.1:17273/ws...
2021-06-11 11:38:19.640 [DBG] CORE: notify: |POKE| (conn) DEX connected - DEX at 127.0.0.1:17273 has connected

JoeGruffins avatar Jun 11 '21 02:06 JoeGruffins

Impressive. The read limit for authenticated connections is 64 KiB, so you must have been funding an order with a shit ton of UTXOs.

https://github.com/decred/dcrdex/blob/db3df625eb736442a464dcc3aea424e8a8491a78/server/comms/link.go#L107-L109

Not sure the right resolution at this time though.

chappjc avatar Jun 11 '21 04:06 chappjc

It looks like from the logs that it is reading, so I guess the server is at fault? Perhaps sending messages in parts if too big is the better answer...

Two possible fix for this includes

  1. Increase read limit for authenticated connections.
  2. Split large messages and send in parts.
  3. Send an error message but don't disconnect websocket.

@chappjc which way to go?

ukane-philemon avatar Mar 12 '22 05:03 ukane-philemon

(1) was done in https://github.com/decred/dcrdex/commit/fb7bb9da37a60f7cd6f4bea7707d11c31fd77610 Not sure about the other options.

chappjc avatar Mar 12 '22 19:03 chappjc

(1) was done in fb7bb9d

This solves error with large messages for now(until the next larger message :) )

Adding other options should wait?

ukane-philemon avatar Mar 12 '22 20:03 ukane-philemon

You may propose other solutions. Not disconnecting sounds like a reasonable first step though, unless it gets too messy.

chappjc avatar Mar 12 '22 21:03 chappjc

Not disconnecting sounds like a reasonable first step though, unless it gets too messy.

seems increasing the read limit is our best shot for now since it immediately reconnects if ws was disconnected due to read error.

ukane-philemon avatar Mar 13 '22 17:03 ukane-philemon

This is related to another problem with the client db.

2022-05-17 17:37:34.453 [INF] CORE: Notifying DEX 127.0.0.1:17273 of our dcr swap contract 893bc4b9bbaa1697a76c65685647077c7efe8e1f62aaf231d2f8424829350001:0 for match ce5a37cc2f4b233d48b4ebf6151e62d1762ec18f40a9d98c3bb095263535963f
2022-05-17 17:37:34.453 [DBG] CORE: notify: |POKE| (order) Swaps initiated - Sent swaps worth 210000.00000000 DCR on order 5d8116af - Order: 5d8116afc623d1f4a07226459bcc30f6bef252ce6e8966b646c2413d6e943a71
2022-05-17 17:37:34.453 [TRC] CORE: Checking match ce5a37cc2f4b233d48b4ebf6151e62d1762ec18f40a9d98c3bb095263535963f (Taker) in status NewlyMatched. Order: 4caffa9647de786e9ee11a447e4d7a70f97dca218250e07dc84f4d1f007211de, Refund coin: , ContractData: , Revoked: false
2022-05-17 17:37:34.457 [TRC] CORE: Checking confirmations on our OWN swap txn 893bc4b9bbaa1697a76c65685647077c7efe8e1f62aaf231d2f8424829350001:0 (dcr)...
2022-05-17 17:37:34.458 [TRC] CORE: notify: |DATA| (match)
2022-05-17 17:37:34.459 [INF] CORE: Contract for match ce5a37cc2f4b233d48b4ebf6151e62d1762ec18f40a9d98c3bb095263535963f with swap coin 893bc4b9bbaa1697a76c65685647077c7efe8e1f62aaf231d2f8424829350001:0 (DCR) expires at 2022-05-17 08:39:30 +0000 UTC (1m56s).
2022-05-17 17:37:34.460 [WRN] CORE: checkTrades completed in 4.340427614s (slow)
2022-05-17 17:37:34.465 [TRC] CORE: notify: |DATA| (balance) balance updated
2022-05-17 17:37:34.465 [INF] CORE: runJob(match) completed in 4.457320633s
2022-05-17 17:37:34.485 [DBG] CORE: Received valid ack for 'init' request for match ce5a37cc2f4b233d48b4ebf6151e62d1762ec18f40a9d98c3bb095263535963f
panic: cannot use addData for pushes > 65535 bytes

goroutine 4148 [running]:
decred.org/dcrdex/dex/encode.BuildyBytes.AddData(...)
        /home/joe/git/dcrdex/dex/encode/encode.go:177
decred.org/dcrdex/client/db.(*MatchProof).Encode(0xc0018c21c0)
        /home/joe/git/dcrdex/client/db/types.go:433 +0x268e
decred.org/dcrdex/client/db/bolt.(*BoltDB).UpdateMatch.func1(0xc0025d60f8?, 0x1ed6c70?)
        /home/joe/git/dcrdex/client/db/bolt/db.go:1227 +0x32e
decred.org/dcrdex/client/db/bolt.(*BoltDB).matchesUpdate.func1(0x1f86630?)
        /home/joe/git/dcrdex/client/db/bolt/db.go:1394 +0x8a
go.etcd.io/bbolt.(*DB).Update(0x24?, 0xc0018e5c98)
        /home/joe/go/pkg/mod/go.etcd.io/[email protected]/db.go:741 +0x82
decred.org/dcrdex/client/db/bolt.(*BoltDB).matchesUpdate(0x4c6876?, 0xeda1555ca?)
        /home/joe/git/dcrdex/client/db/bolt/db.go:1385 +0x38
decred.org/dcrdex/client/db/bolt.(*BoltDB).UpdateMatch(0x0?, 0xeda155642?)
        /home/joe/git/dcrdex/client/db/bolt/db.go:1213 +0x8c
decred.org/dcrdex/client/core.(*trackedTrade).auditContract(0xc00033a500, 0xc0011443c0, {0xc0009e0240, 0x24, 0x24}, {0xc0025da0e0, 0x61, 0x61}, {0xc0026ba000, 0x1952e, ...})
        /home/joe/git/dcrdex/client/core/trade.go:2605 +0x831
decred.org/dcrdex/client/core.(*trackedTrade).processAuditMsg.func1()
        /home/joe/git/dcrdex/client/core/trade.go:2429 +0x99
created by decred.org/dcrdex/client/core.(*trackedTrade).processAuditMsg
        /home/joe/git/dcrdex/client/core/trade.go:2427 +0x59e

Maybe just a sensible limit on the number of coins that can fund an order? Can be measured in serialized bytes.

JoeGruffins avatar May 17 '22 08:05 JoeGruffins

Well that's unfortunate. The only way to remove the 65535 hard-coded limit is to do a DB upgrade and change the data type BuildyBytes uses to encode length of data from uint16 to uint32.

In retrospect, it's not too surprising that CountTxData (the actual swap transaction for utxo assets) would get above the limit. Maybe we need an encoder for this that splits it up into chunks, as much as that sucks.

Maybe just a sensible limit on the number of coins that can fund an order? Can be measured in serialized bytes.

Seems sensible to me.

chappjc avatar May 17 '22 13:05 chappjc

Another wilder possibility would be for AuditInfo to include an optional stripped tx data, which for utxo assets would just have witness data and sigscripts zeroed, or even the inputs removed entirely inputs retained for txid. Then that could be stored in the match proof in place of the full signed tx. Only minor edge that would create is if a re-audit on restart of dexc did a rebroadcast it would likely log an error about an incomplete tx, but the audit would still succeed since it only needs the one tx out.

chappjc avatar May 17 '22 13:05 chappjc

The only way to remove the 65535 hard-coded limit is to do a DB upgrade and change the data type BuildyBytes uses to encode length of data from uint16 to uint32.

There may be a soft encoding change that requires no upgrade, rewriting of data or wasting space on the length bytes. It's a cludge, but I'm considering this: https://github.com/decred/dcrdex/commit/f32c103d29bfab804c9a0438e75c272d3c9baabd EDIT: now merged in https://github.com/decred/dcrdex/pull/1620

We should still place reasonable limits on the number of funding coins though.

chappjc avatar May 17 '22 15:05 chappjc

We should still place reasonable limits on the number of funding coins though.

Low hanging fruit .. and maybe early warning.

dev-warrior777 avatar Jul 20 '23 15:07 dev-warrior777

Low hanging fruit .. and maybe early warning.

Yeah, still an error, but at least one that says what the issue is rather than having a comms error about message size.

chappjc avatar Jul 20 '23 15:07 chappjc