Tracking Issue: Backfilling of transfer data
Hi TB team (adding this here for tracking)
If there is a TigerBeetle client 🐯 and it wants to add historical data for example from the past 2 years from the OLGP (keeping timestamp consistency), it would be useful to have a 'backfill' mode or short-term tool available where timestamp fields can be passed.
If it's a flag then maybe it can have a --backfill mode and when the cluster is restarted (without flag) then this functionality can no longer happen and traffic flows normally (without the ability for timestamp thrashing).
This would be useful for:
-
Clients who want to use TigerBeetle as a single source of truth for ledger (by backfilling their data)
-
Queries based on timestamp range (windows) to work seamlessly as opposed to using user_data_x which may not support range queries for sometime.
Thank you 🙏🏼
In general, this might be a tad complicated, because timestamps in tigerbeetle are server-generated and server as the (internal) primary key. So just ramming in arbitrary timestamps won't work.
That said, we actually have something like this implemented! We support backfilling TigerBeetle cluster from a different tigerbeetle cluster, through AOF file.
This isn't very well documented yet, and I don't think we tried using that for importing data from different systems, but this should be possible!
Hi @matklad - Thanks for getting back on this.
I think the AOF file could work perfectly as this is a one-off task by clients usually.
If there is any documentation on this or steps when the team gets time it would be great to have so it can be tested?
A sample AOF file would also be ace to try and generate.
I saw this PR and but couldn't find the sample test.aof - https://github.com/tigerbeetle/tigerbeetle/pull/355
Hey @jamalzkhan - a good starting point is the test_aof.sh file which runs tests for it in CI.
Feel free to ask if you have any questions about what it's doing!
Hi @cb22 - Thank you, tried to run the test_aof.sh but when I tried to open the AOF file to try and add my own data I can't find a reader / writer for it - I think it's probably an internal binary format.
Any pointers as to how I can construct an AOF file using some simple syntax like a CSV or JSON format and pipe this into some tool ?
Follow up with what we discussed in our design meeting:
Allowing backfill transfers and accounts through a pair of new operations, import_accounts and import_transfers, that would behave mostly like create_{accounts, transfers} but with some different validations on timestamp:
- Timestamps must be set by the user, such as
timestamp > 0 and timestamp < now(no future timestamps are allowed). - Timestamps must be unique across the entire database, i.e. no two objects can have the same timestamp, even different objects like a transfer and an account cannot share the same timestamp.
- Timestamps must always be increasing, such as
timestamp > last_timestamp_inserted. That is, once an object with timestampXis created, it's not expected to import an object with timestampX - 1. This validation has a deep impact on how the data source must be organized prior to importing into TigerBeetle. For example, batches of accounts and transfers must be submitted in chronological order, interleaving accounts and transfers in the expected order. Multiple clients/threads may interfere with the order if not properly synchronized by the application layer. - The import process is intended to be used in a fresh cluster before any
create_{accounts, transfers}call. It's not explicitly forbidden to useimport_{accounts, transfers}in a regular cluster, however, the validation ruletimestamp > last_timestamp_inserted and timestamp < nowmakes it naturally restrictive for other purposes than the first migration into an empty database.
EDITED
Let’s just rather call it “import_accounts/transfers” than “backfill”: Import has a nice symmetry with export “Backfill” might give impression of reverse chronological order “Back” reminds me of “backwards” (as in a step backwards) And… code haiku… “import” lines up in src with “create” :stuck_out_tongue:
Hi @batiati
I went through the thinking on this design. It's spot on. Regarding the validations, I think this will mean that as a user is about to start using TigerBeetle properly they will prepare their relevant data and have to ensure everything is in order.
Appreciate the time taken by the team on this matter and I'm sure this will make a huge difference to all the users. 🙏🏼
This feature is also important/useful for integration testing and simulation.
Hi @batiati,
Coming from #1968, I think the design you described here should work for the migration use case. I am curious what kind interfaces do you have in mind? For example, the accounts/transfers should be organized in certain file format, or maybe it is the user's responsibility to build their own importer using provided APIs (so that the file format does not matter)?
Thanks!
Closed by https://github.com/tigerbeetle/tigerbeetle/pull/2171