clio
clio copied to clipboard
Deal with missing transactions
There are some old transactions that rippled cannot deserialize, and thus does not return to clio via ETL. We need a way to get these transactions though, at least as a simple blob, and store them in the database, and to also return them as a simple blob.
Do you have any examples of these? I have just binary deserialized all transactions using ripple-binary-codec, and have not had any issues (https://github.com/XRPLF/xrpl.js/tree/main/packages/ripple-binary-codec#readme).
Here are some ledgers that appear to contain transactions that cannot be deserialized. Rippled does not even return these transactions over the API, in any form. It catches an exception, and just returns what it can deserialize. The only way we caught this is we were recomputing hashes for the transaction map for every ledger, and a few hashes were incorrect, pointing to missing transactions.
COULD NOT VERIFY LEDGER TX 562177
COULD NOT VERIFY LEDGER TX 6409247
COULD NOT VERIFY LEDGER TX 7266393
COULD NOT VERIFY LEDGER TX 7266396
I don't know what the transactions are themselves. We would need to modify the rippled code to at the very least print out the tx blob when it tries to deserialize and catches an exception.
Looks like a rippled
problem. It should be fixed at the source, not in clio
.
I don't think it makes sense to close this. The issue is not fixed, because clio is missing transactions. Which means clio can not fulfill it's API promise. I don't think it's relevant what the cause is. Clio is still not behaving as promised or desired.
The route forward here would be to modify the gRPC handlers in rippled which clio uses to extract data, and to at least return the transactions as raw binary. Don't even try to deserialize them. While yes, this code lives in rippled, it was written exclusively for clio, by me, and is a part of rippled that is really owned by clio and the clio team. @injaelee
@cjcobb23 we are happy to work towards fixing it if you can provide more info on reproducing this.
You have to find a rippled server with enough history, and extract one of these ledgers:
562177
6409247
7266393
7266396
clio will just skip over the bad transaction, but the verifier script will throw an error when you try to verify the ledger.
Also, running the verifier script on a full history clio server will let you know which ledgers have transactions that cause this.
I seem to be running into this issue synching from a particular start sequence, but clio 2.0 won't progress beyond the ledger sequence that has the unserializable transactions. For me, clio can't deserialize ledger 75449940 thus doesn't proceed to the next ledger sequence. Temporary fix is to just modify the clio keyspace table ledger_range to skip over the unserializeable ledger sequence. It would be nice to get some closure on this issue since it's been lingering for so long.
I seem to be running into this issue synching from a particular start sequence, but clio 2.0 won't progress beyond the ledger sequence that has the unserializable transactions. For me, clio can't deserialize ledger 75449940 thus doesn't proceed to the next ledger sequence. Temporary fix is to just modify the clio keyspace table ledger_range to skip over the unserializeable ledger sequence. It would be nice to get some closure on this issue since it's been lingering for so long.
Hi @ajkagy , Thanks for reporting this. To help us reproduce the issue, It will be very helpful if you can provide the below information: 1 Clio's error log 2 The ETL rippled's error log 3 The Clio and its ETL rippled's version
thanks for the quick reply here @cindyyan317
Here's an attached clio log from startup which seems like it chokes on ledger 75681445 after 10 retry attempts, adds other ledgers to the ETL queue, but then the ledger_range never updates to skip the ledger that it can't receive despite rippled having this validated ledger in it's db.
Edit: Rippled version: 1.12.0 clio version: clio-2.0.0 cassandra version: 4.1.3
working on pasting my rippled ETL log here.
Adding another note here. A common denominator seems to be the ledgers where that have a large amount of NFT mint txns where clio can't seem to progress. Trying to determine if it's a downstream cassandra issue and not necessarily clio.
here's a few more ledgers that basically stop clio from progressing. https://bithomp.com/ledger/75755384 https://bithomp.com/ledger/75755417
@cindyyan317 update: I was able to produce this on testnet with the exact same riddled, clio and cassandra versions. clio stops progressing.
here is the ledger: https://test.bithomp.com/ledger/43473563
This seems like a completely different issue altogether, but couldn't find an existing issue open for this.
@ajkagy Thanks for the detail. We can't repro this , the problematic ledgers can be processed by our nodes. From the log, it seems like got stuck when writing the db.
When the node was upgraded to Clio 2.0 ? Does it start fresh or migrated? Can you also open the Backend log?
@cindyyan317 node is fresh. I'm pretty positive this is a configuration thing based on the particular cassandra version and not a Clio issue since we're having no issues with earlier cassandra versions and scylla (which uses an earlier cassandra version). I'll try and get some more detailed logs together.
Thanks for your help!