joinmarket icon indicating copy to clipboard operation
joinmarket copied to clipboard

Improve scaling of commitments and blacklist persistence

Open AdamISZ opened this issue 7 years ago • 4 comments

The files blacklist and commitments.json store lists of used commitment hashes in hex. The first of these lists in particular is likely to grow large. It would be good to optimise access to (and perhaps storage of) these lists.

This is not a high priority issue, but should probably be addressed at some point.

blacklist: This is just an unformatted list of hashes, one per line. It is processed in joinmarket.configure.check_utxo_blacklist(). The processing is as primitive as conceivable: open the file, read the list into memory, update, reopen the file and overwrite (although there is a lock to prevent threads conflicting in access). This file will accumulate data with the global transaction volume (if Makers have jm_single().accept_commitment_broadcasts set "on", which they do by default) which is fairly slow today but may speed up considerably. We probably need a way to be smarter about usage and access once the file starts to get large.

In the much longer term it might be OK to delete old sections of the blacklist, but we can discuss that if it ever becomes an issue. Individual Makers losing all or part of their blacklists doesn't matter too much.

commitments.json This file is found by default in the cmttools folder, and is formatted/pretty-printed json. It is accessed in bitcoin.podle.update_commitments() (TODO refactor commitments logic, read/write should not be in the bitcoin package/module) and the same comments as above apply about primitive processing.

It has two sections used and external; the latter is controlled entirely by the user and is not likely to grow. The former will only grow large if the user is a very active Taker, so this is unlikely to be an issue (a bigger practical issue is that Takers need to not lose their commitments.json, it's a major pain for them if they do!). Still any optimisation found for blacklist might also apply here.

Final note is that commitments.json should preserve readability really, while blacklist is fine to be unreadable.

AdamISZ avatar Sep 20 '16 17:09 AdamISZ

Could BlockchainInterface.query_utxo_set() be used to tell whether a utxo has already been spent and therefore it's commitment can be deleted?

chris-belcher avatar Sep 21 '16 12:09 chris-belcher

@chris-belcher yes, i was musing about that on IRC yesterday, but my feeling was it's a very small win because you'll only be able to do it for commitments for transactions that you were involved in. For other txs that information is inaccessible.

Still, if it's easy, it can be done (and I think it is relatively easy - ~~you'd have to wait until such time you know it was actually pushed successfully~~ edit: not exactly, only a certain proportion of the time will the utxo used for commitment also be consumed in the transaction, it could also be another utxo in the wallet, or an external-to-wallet utxo - so I suppose just a dumb query_utxo_set at some point in the future, meaning you'd have to keep a record of the used utxos. meh, a bit messy, might not be worth putting time into).

AdamISZ avatar Sep 21 '16 16:09 AdamISZ

Could there be a timeout, a commitment is banned only for (say) 2 months and then deleted from disk?

It would still provide a massive rate limiting to a spy.

chris-belcher avatar Nov 03 '16 00:11 chris-belcher

Yeah I was thinking that too yesterday, just keep it simple kind of approach.

The thing is, as described above, the way it's handled now it's completely dumb, it shouldn't require more than a fairly minimal coding effort to streamline the access and control of the list so that you can easily have say 100k entries or more without it being an issue.

AdamISZ avatar Nov 03 '16 07:11 AdamISZ