atuin
atuin copied to clipboard
atuin server sync slow with 50k+ items
hello
I have the atuin
server running on my local Kubernetes cluster. Syncing my 50k+ history is fairly slow (30s-120s). From what I can tell the sync operation is doing 100 history items per batch:
2023-01-29T03:51:20.640330Z DEBUG hyper::proto::h1::io: parsed 12 headers
2023-01-29T03:51:20.640413Z DEBUG hyper::proto::h1::conn: incoming body is empty
2023-01-29T03:51:20.640822Z DEBUG request{method=GET uri=/sync/count version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2023-01-29T03:51:20.650692Z DEBUG request{method=GET uri=/sync/count version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=51 ms status=200
2023-01-29T03:51:20.651127Z DEBUG hyper::proto::h1::io: flushed 123 bytes
It looks like the process being limited by 100 lines / request introduces a fair amount of overhead. It also looks like the process is single-threaded/sequential. Having the batch size and concurrent operations be configurable should improve sync performance quite a bit.
$ time atuin sync
Sync complete! 50626 items in database, force: false
atuin sync 2.36s user 1.24s system 6% cpu 58.448 total
I wanted to try and bump the HISTORY_PAGE_SIZE, but I don't have the Rust toolchain setup to build this for my macOS client. A build guide would be very welcome!
Thanks
Hey! We could make this configurable 🤔 Though, you should be able to get much more performance from the server than that. My deployment pretty routinely hits 1000s of req/s without sweating.
Really, it's just the client being conservative by default. I'm more than happy to make the batch size configurable, but would rather leave it to make serial requests. The user should not experience a sudden surge in open sockets just because their shell history is syncing
You can install the toolchain with https://rustup.rs, and Atuin can be built in the standard way with cargo build
Thanks for your reply @ellie!
I'm running my deployment on my homelab Kubernetes, it's not state-of-the-art when it comes to performance. The postgresql DB also runs on network block storage.
Some findings:
- Setting up a local Docker-based server (both atuin and postgresql), on my M1 Macbook Air results in syncing in 6-7s (or ~7150 lines/s). Which is pretty good!
- My deployment consists of two Pods, one with Postgres and one with atuin
- When postgres is running on my fastest worker, and the atuin Pod is on the same node, syncing takes ~27s (~1850 lines/)
- When postgres is running on my fastest worker, but the atuin Pod is running on another node, syncing takes ~140s (~350 lines/s)
- When postgres is running on a slightly slower worker, and the atuin Pod is on the same node, syncing takes ~44s (~1150 lines/s)
- Finally, when postgres is running on the slightly slower worker, but the atuin Pod is on the fastest worker, syncing takes ~130s (~380 lines/s)
So there is a significant cost to running atuin on a separate node from the Postgresql node. I don't know the architecture, but it seems like each batch needs to do the full roundtrip to the database before it returns to the client and the next batch can begin.
Would it be possible to use more memory on the atuin server side, and buffer the client's batches, in order to return fast to the client and then spend time on the backend to sync to the database?
This would enable a crisp, low latency experience for the client and offload the waiting/processing to the server. The tradeoff is that the client must trust that the server eventually persists the data. I think this is an OK tradeoff given this is mainly a sync server and most of the time the state is with the clients.
What do you think?
So I installed the rust toolchain and built a couple of versions with higher batch sizes, then ran it according to the fastest combination I had before (fastest worker, and the atuin Pod is on the same node).
I also switched to wired Ethernet to make sure the network connection doesn't interfere. I am able to significantly reduce the sync time just switching to Ethernet, which is indicative or this being made much worse with latency. (FWIW my wi-fi setup is WiFi 6, 1300mbps free line of sight, but you can't beat wired! 😊)
- 100 lines/batch (baseline): 17-18s (~3000 lines/s)
- 1000 lines/batch: 13-14s (~3600 lines/s)
- 1500 lines/batch: 6s (~8500 lines/s)
-
2000 lines/batch: 5-6s (~9200 lines/s) - worth noting is that this is the same performance as running atuin and postgres locally with the default 100
HISTORY_PAGE_SIZE
- going past 2000 results in no extra performance gain, my setup seem to have reached its max here
now.. switching back to wi-fi I see 7-9s with HISTORY_PAGE_SIZE=2000
, so that seem to be a sweet spot for my setup, wi-fi or not.
I still think the atuin server
component could be able to return faster if it was buffering the data (unless it's doing that already of course, but given that the memory footprint is tiny I assume it's not).
Thanks 🙏
Thank you for doing this investigation 🙏
I'd be more than happy to allow configuring the batch size, and actually this is something I always had in mind and just... never did 😓 If this is something you'd like to work on I'd be happy to help you out? Otherwise I can try and get to it soon.
Otherwise I don't think we would consider buffering before committing to the database, as
- I'd like a HTTP 200 to mean "your history is safe", and this would increase the risk of data loss
- I'd like to keep the memory usage of the server nice and low
While speed is nice, I would rather not compromise the above in order to chase performance