rerun icon indicating copy to clipboard operation
rerun copied to clipboard

Investigate `mimalloc` leaks starting with 0.1.38

Open teh-cmc opened this issue 1 year ago • 3 comments

Related:

  • https://github.com/rerun-io/rerun/pull/5874

teh-cmc avatar Apr 09 '24 13:04 teh-cmc

I've tried to repro the issue with 0.1.38 with main as of today, but failed so far. Given that the data store has been entirely rewritten with chunks, this is not entirely surprising that we don't trigger the leak in the same way.


For record, here is my attempt setup:

Viewer launched with:

RERUN_CHUNK_MAX_ROWS=1 cargo r -p rerun-cli --no-default-features --features native_viewer --release -- --memory-limit 256MB
This python script, executed with `RERUN_FLUSH_NUM_ROWS=0`
# %%
from numpy.typing import NDArray
import numpy as np
import rerun as rr


# %%
def gen_color_list(num_colors: int):
    """
    Generates a list of random RGB color values.

    Args:
        num_colors (int): The number of colors to generate.

    Returns:
        list: A list of RGB color values, where each value is a list of three integers between 0 and 255.
    """
    color_list = []
    for _ in range(num_colors):
        r = np.random.randint(0, 256)
        g = np.random.randint(0, 256)
        b = np.random.randint(0, 256)
        color_list.append([r, g, b])
    return color_list


def scale_tsx(tsy: NDArray, ch_count: int) -> NDArray:
    """
    Scales a time series array by adding a constant value to each channel.

    Args:
        tsy (NDArray): _description_
        ch_count (int): _description_

    Returns:
        NDArray: _description_
    """
    scale_factor = tsy.max() + (tsy.std() * 2)
    channel_scale_factor = np.arange(ch_count) * scale_factor
    tsx_scaled = tsy.T + channel_scale_factor.reshape(-1, 1)
    return tsx_scaled


# %%
# mock data

# Define sampling parameters
num_channels = 16  # Number of channels
sample_rate = 24000  # Sampling rate in Hz
duration_min = 30  # Duration in minutes

# Calculate total number of samples
total_samples = sample_rate * 60 * duration_min

# Generate random samples for each channel
random_samples = np.random.uniform(-1.0, 1.0, (total_samples, num_channels)).astype(
    np.float32
)

# Print the shape of the generated random samples array
print("Shape of random samples array:", random_samples.shape)
# %%
traces_scaled = scale_tsx(random_samples[:, :], 16)
ch_colors = gen_color_list(16)
# %%
rr.version()
rr.init("testSubject2", spawn=True)

for ch_id in np.arange(16):
    rr.log(
        f"mockdata/ch{ch_id}",
        rr.SeriesLine(color=ch_colors[ch_id], name=f"ch{ch_id}", width=0.5),
        timeless=True,
    )
# %%
# Log the data on a timeline called "step".
for t in range(0, traces_scaled.shape[1]):
    rr.set_time_sequence("step", t)
    for ch_id in np.arange(16):
        rr.log(f"mockdata/ch{ch_id}", rr.Scalar(traces_scaled[ch_id, t]))
# %%

Example run with 0.1.38 (appears stable over time):

image image

abey79 avatar Sep 30 '24 10:09 abey79

Original slack thread: https://rerunio.slack.com/archives/C041NHU952S/p1712658365310999

abey79 avatar Sep 30 '24 10:09 abey79

Thanks @abey79.

Let's unpin mimalloc as soon as 0.19 ships -- that will give us a few weeks to identify any weird behavior before 0.20.

teh-cmc avatar Sep 30 '24 10:09 teh-cmc

Using the old mimalloc was also the cause of the recent alignment bug:

  • https://github.com/rerun-io/rerun/pull/7563
  • https://github.com/purpleprotocol/mimalloc_rust/issues/128

emilk avatar Oct 04 '24 02:10 emilk

We've been on mimalloc 0.1.43 for a while and have received no complaints 👍

teh-cmc avatar Dec 13 '24 08:12 teh-cmc