nats.rs
nats.rs copied to clipboard
Possible memory leak on high messages throughput with v0.20.1
Make sure that these boxes are checked before submitting your issue -- thank you!
- [x] Included below version and environment information
- [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)
NATS version (grep 'name = "nats"' Cargo.lock -A 1
)
version = "0.20.1"
rustc version (rustc --version
- we support Rust 1.41 and up)
rustc 1.59.0 (9d1b2106e 2022-02-23)
OS/Container environment:
Linux 5.15.0-30-generic #31-Ubuntu SMP Thu May 5 10:00:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Steps or code to reproduce the issue:
First discovered as a weird memory usage in Vector using NATS source described in this issue
When running this code and hitting NATS with ~800k msg/sec memory usage shoots up and after all messages are processed memory is not being reclaimed by the system
Reciever
use nats;
use std::error::Error;
use tokio;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let nc = nats::asynk::connect("172.10.0.1:4222").await?;
let sub = nc.subscribe("msg.test").await?;
// Receive a message.
while let Some(msg) = sub.next().await {
println!("{:#?}", msg)
}
Ok(())
}
Producer snipet
let nc = nats::connect("172.10.0.1:4222").unwrap();
loop {
nc.publish("msg.test", MSG).unwrap();
}
nats-top output
Memory usage
Expected result:
After all messages has been processed memory is reclaimed by the system
Actual result:
Memory leak
If there is anything else I can provide for further clarification - please let me know
Thanks
@mikhailantoshkin thanks for the report. I'm looking into this.
@mikhailantoshkin
Which side encounters the leak? The receiving end?
@Jarema yes, I'm not sure what causes it, but until some threshold memory usage is not increasing at all, but then it skyrockets
Here are some screenshots from the setup I'm using to reproduce this
I created a repo with my test setup to reproduce the issue
Decided to retest it with version 0.23.0
and also run it under bytehound. Weirdly enough, it does not reproduce when running nats-eater
binary from MCVE repo with bytehound, but running it standalone still reproduces the memory leak.
Anyway, here is the link for the profiling report (452 Mb uncompressed)
Also, it does not reproduce with async-nats
0.19.0
, though it is slower according to nats-top
In your loop where you are using sleep() to try to achieve a certain message rate per second, the sleep time is increased by the async-nats client's default flush interval, so you are guaranteed (with default settings) to get a lower bandwidth than the target from the cli parameter.
One change that will increase bandwidth with async-nats is to call flush() after publish. For performance reasons, the default flush interval (time it waits to see if there are more messages to send before flushing on its own) is non-zero. The nats team is planning to do some performance tuning of the default interval but for unique situations like this benchmark you probably want non-default values anyway.
There are still two issues in this benchmark that interfere with the two goals of getting a specific target message rate and getting great throughout. To increase accuracy, you could use Interval to measure elapsed time and sleep the difference between desired interval and elapsed. To increase throughput, publish messages in batches of N, followed by one call to flush per batch, and increase the expected interval by a factor of N. That will let you amortize overhead of rust thread switching and take advantage of TCP and network efficiencies. I might try N=100, or N=1000 if the messages are very small.
Also, println!()
may become a bottleneck so you might want to print only every N as well.