nats.rs icon indicating copy to clipboard operation
nats.rs copied to clipboard

Possible memory leak on high messages throughput with v0.20.1

Open mikhailantoshkin opened this issue 2 years ago • 7 comments

Make sure that these boxes are checked before submitting your issue -- thank you!

  • [x] Included below version and environment information
  • [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

NATS version (grep 'name = "nats"' Cargo.lock -A 1)

version = "0.20.1"

rustc version (rustc --version - we support Rust 1.41 and up)

rustc 1.59.0 (9d1b2106e 2022-02-23)

OS/Container environment:

Linux 5.15.0-30-generic #31-Ubuntu SMP Thu May 5 10:00:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Steps or code to reproduce the issue:

First discovered as a weird memory usage in Vector using NATS source described in this issue

When running this code and hitting NATS with ~800k msg/sec memory usage shoots up and after all messages are processed memory is not being reclaimed by the system

Reciever

use nats;
use std::error::Error;
use tokio;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let nc = nats::asynk::connect("172.10.0.1:4222").await?;
    let sub = nc.subscribe("msg.test").await?;

    // Receive a message.
    while let Some(msg) = sub.next().await {
        println!("{:#?}", msg)
    }
    Ok(())
}

Producer snipet

    let nc = nats::connect("172.10.0.1:4222").unwrap();
    loop {
        nc.publish("msg.test", MSG).unwrap();
    }

nats-top output image

Memory usage image image

Expected result:

After all messages has been processed memory is reclaimed by the system

Actual result:

Memory leak

If there is anything else I can provide for further clarification - please let me know

Thanks

mikhailantoshkin avatar May 25 '22 06:05 mikhailantoshkin

@mikhailantoshkin thanks for the report. I'm looking into this.

Jarema avatar May 25 '22 11:05 Jarema

@mikhailantoshkin

Which side encounters the leak? The receiving end?

Jarema avatar May 25 '22 11:05 Jarema

@Jarema yes, I'm not sure what causes it, but until some threshold memory usage is not increasing at all, but then it skyrockets

mikhailantoshkin avatar May 26 '22 16:05 mikhailantoshkin

Here are some screenshots from the setup I'm using to reproduce this image image

mikhailantoshkin avatar May 26 '22 17:05 mikhailantoshkin

I created a repo with my test setup to reproduce the issue

mikhailantoshkin avatar May 26 '22 17:05 mikhailantoshkin

Decided to retest it with version 0.23.0 and also run it under bytehound. Weirdly enough, it does not reproduce when running nats-eater binary from MCVE repo with bytehound, but running it standalone still reproduces the memory leak.

Anyway, here is the link for the profiling report (452 Mb uncompressed)

Also, it does not reproduce with async-nats 0.19.0, though it is slower according to nats-top

mikhailantoshkin avatar Sep 04 '22 18:09 mikhailantoshkin

In your loop where you are using sleep() to try to achieve a certain message rate per second, the sleep time is increased by the async-nats client's default flush interval, so you are guaranteed (with default settings) to get a lower bandwidth than the target from the cli parameter.

One change that will increase bandwidth with async-nats is to call flush() after publish. For performance reasons, the default flush interval (time it waits to see if there are more messages to send before flushing on its own) is non-zero. The nats team is planning to do some performance tuning of the default interval but for unique situations like this benchmark you probably want non-default values anyway.

There are still two issues in this benchmark that interfere with the two goals of getting a specific target message rate and getting great throughout. To increase accuracy, you could use Interval to measure elapsed time and sleep the difference between desired interval and elapsed. To increase throughput, publish messages in batches of N, followed by one call to flush per batch, and increase the expected interval by a factor of N. That will let you amortize overhead of rust thread switching and take advantage of TCP and network efficiencies. I might try N=100, or N=1000 if the messages are very small.

Also, println!() may become a bottleneck so you might want to print only every N as well.

stevelr avatar Sep 04 '22 21:09 stevelr