goose icon indicating copy to clipboard operation
goose copied to clipboard

perf: Reduce allocations in request metrics hot path

Open jeremyandrews opened this issue 4 months ago • 9 comments

This pull request resolves the performance issue #638 by optimizing the GooseRequestMetric struct to reduce heap allocations and string cloning.

The previous implementation made unnecessary string allocations for every request. This change addresses the bottleneck by:

  • Replacing transaction_index: String with usize
  • Converting scenario_name and name from String to Arc<str> to enable efficient sharing across multiple metrics without cloning

These changes remove heap allocations from the hot path of metrics creation. Initial testing with the transaction_index conversion showed:

  • Requests per Second (RPS): Increased by 7.2% (from 453.68 to 486.46)
  • Average Response Time: Decreased by 9.9% (from 299.18 ms to 269.62 ms)

Performance testing is still needed to confirm the impact of the Arc<str> changes.

A future PR could explore using the arcstr crate for additional optimizations.

  • Closes #638

jeremyandrews avatar Aug 15 '25 15:08 jeremyandrews

@gemini-cli /review

LionsAd avatar Aug 15 '25 15:08 LionsAd

@gemini-cli /review

LionsAd avatar Aug 16 '25 00:08 LionsAd

There is a problem with the Gemini CLI PR review. Please check the action logs for details.

github-actions[bot] avatar Aug 16 '25 00:08 github-actions[bot]

@gemini-cli /review

LionsAd avatar Aug 16 '25 19:08 LionsAd

I've removed redundant if transaction_detail.transaction_index.is_empty() check since parse::<usize>() on empty strings fails anyway and unwrap_or(usize::MAX) handles it correctly.

I'm unable to perform the recommended Cow::Borrowedbecause thenameparameter comes fromself.get_request_name()which returns dynamic strings (request paths like/api/users/123, user-defined names, runtime configuration). These don't have 'staticlifetime required byCow<'static, str>`.

jeremyandrews avatar Aug 18 '25 06:08 jeremyandrews

That last review confused me. So therefore I need to revise my RTBM: Did you measure that the cow changes have an effect as well or is it only the usize change that makes things faster?

LionsAd avatar Aug 25 '25 10:08 LionsAd

Explanation by Gemini:

No, in the specific way you've used it, there is no performance advantage to using Cow::Owned over String. In fact, it's slightly less performant and uses marginally more memory.

The change from String to Cow::Owned(some_string) adds a small, unnecessary layer of abstraction without providing any benefit. Let's break down why.


The Real Power of Cow: Avoiding Allocations

Cow stands for Clone-on-Write. Its entire purpose is to help you avoid allocating new memory (like creating a new String) if you don't have to. It's a smart enum that can hold either a borrowed reference or an owned value.

The performance gain from Cow comes when you can use the Cow::Borrowed variant.

Consider a function that sometimes needs to modify a string and sometimes doesn't.

Without Cow (Always Allocates):

// This function *always* allocates a new String, even if the input doesn't change.
fn process_message(msg: &str) -> String {
    if msg.starts_with("error:") {
        // We need to modify it, so we create a new String.
        msg.to_uppercase() 
    } else {
        // No change needed, but the return type is String,
        // so we are forced to allocate and copy anyway.
        msg.to_string() // <-- Unnecessary allocation!
    }
}

With Cow (Allocates Only When Needed):

use std::borrow::Cow;

// This function only allocates when it's absolutely necessary.
fn process_message_efficiently(msg: &str) -> Cow<'_, str> {
    if msg.starts_with("error:") {
        // Modification is needed, so we allocate and return an Owned variant.
        Cow::Owned(msg.to_uppercase())
    } else {
        // No change needed! We can just return a borrowed slice.
        // No allocation, no copying. Super fast!
        Cow::Borrowed(msg) // <-- The winning move!
    }
}

How this applies to your struct:

You should use Cow<'static, str> in your struct if some instances of that struct can use a &'static str (a string literal compiled into your program), while others need a String created at runtime.

use std::borrow::Cow;

pub struct Scenario {
    pub scenario_name: Cow<'static, str>,
}

// ✅ PERFECT use case for Cow
// This one uses a static string literal. No allocation for the name.
let default_scenario = Scenario {
    scenario_name: Cow::Borrowed("Default Login"), 
};

// This one is from a dynamic source. Here we must allocate.
let dynamic_name = "User_42_special_case".to_string();
let custom_scenario = Scenario {
    scenario_name: Cow::Owned(dynamic_name),
};

Back to Your Example

Let's analyze your specific change:

  • Old Code: scenario_name: transaction_detail.scenario_name.to_string(),

    1. ...to_string(): Creates a new, heap-allocated String. This is the main performance cost.
    2. The String (which is 3 words: pointer, length, capacity) is stored in your struct.
  • New Code: scenario_name: std::borrow::Cow::Owned(transaction_detail.scenario_name.to_string()),

    1. ...to_string(): Creates a new, heap-allocated String. The main performance cost is still here.
    2. Cow::Owned(...): This wraps the String you just created inside the Owned variant of the Cow enum.
    3. The Cow is stored in your struct. On a 64-bit system, a Cow is often slightly larger than a String on the stack (e.g., 4 words vs 3 words) because it needs to store the enum variant tag in addition to the string data.

You are performing the expensive allocation in both cases, but in the second case, you are adding the extra step of wrapping it in an enum. You never use the Cow::Borrowed variant, so you get all of the overhead of Cow with none of the benefits.


Summary: String vs. Cow

  • Use String when: You know you will always have an owned, mutable string. This is the most common and straightforward case.

  • Use Cow<'a, str> when: Your field could either be a cheap, borrowed reference (like a 'static literal) or an owned String. The goal is to default to borrowing and only pay the cost of owning/allocating when absolutely necessary. 👍

LionsAd avatar Aug 25 '25 11:08 LionsAd

Discussion with gemini:

Fabian: Is there a way to do this with Cow::Borrowed or another tool like Arc(), where it's the same string over and over, but we don't want to do the index thingy [which obviously still would work].

Gemini said:

Yes, there is a perfect tool for this, and your intuition is spot on. Arc<str> is the ideal solution for sharing the same immutable string data across many structs without using indices or lifetimes.

Arc stands for Atomically Reference Counted. Think of it as a smart pointer that lets multiple "owners" share the same piece of data. The data is only cleaned up when the very last owner is gone.

This directly solves your problem:

* You have one `String` for "Scenario A".
* You can have thousands of `GooseRequestMetric` structs that all need to 

refer to "Scenario A". * Instead of each one cloning the String (expensive) or holding an index (inconvenient), they can each hold a cheap, lightweight Arc pointing to the single, shared "Scenario A" string data.


The Arc<str> Solution

Here’s how you would refactor your code to use Arc<str>.

1. Update Your Core Structs

First, change your Scenario and Transaction structs to hold the names as Arc<str>. You do this once when you first load your configuration.

use std::sync::Arc;

pub struct GooseAttack {
    pub scenarios: Vec<Scenario>,
    // ...
}

pub struct Scenario {
    // This name is now a reference-counted string slice.
    pub name: Arc<str>,
    pub transactions: Vec<Transaction>,
    // ...
}

pub struct Transaction {
    // This name is also a reference-counted string slice.
    pub name: Arc<str>,
    // ...
}

2. Update Your Metric Struct

Your GooseRequestMetric will now also hold Arc<str> for the names. Cloning an Arc is extremely cheap—it just copies a pointer and atomically increments a counter. It does not copy the underlying string data.

use std::sync::Arc;

// The metric is now self-contained. No more indices!
pub struct GooseRequestMetric {
    pub scenario_name: Arc<str>,
    pub transaction_name: Arc<str>,
    pub name: String, // Keep this as a String since it's unique per request.
    // ... other fields
}

impl GooseRequestMetric {
    pub fn new(
        scenario: &Scenario,
        transaction_index: usize,
        name: String,
        // ... other params
    ) -> Self {
        GooseRequestMetric {
            // Arc::clone() is the magic. It's super fast!
            scenario_name: scenario.name.clone(),
            transaction_name: 
scenario.transactions[transaction_index].name.clone(),
            name, // The unique name is still moved in.
            // ... other fields
        }
    }
}

3. How to Create the Initial Arc<str>

When you first parse your scenario configuration from a file or some other source, you'll have Strings. You can easily convert them into Arc<str>.

// During setup...
let scenario_name_from_config: String = "My E-Commerce Checkout".to_string();

let scenario = Scenario {
    // Convert the String into an Arc<str> once.
    name: Arc::from(scenario_name_from_config.as_str()),
    transactions: vec![/* ... */],
};

Comparison of Approaches

This table summarizes why Arc<str> is the big winner here.

Approach Memory Usage CPU Cost (Creation) Ergonomics (Ease of Use)
Cloning String High (each metric gets a full copy) High (heap allocation per string) Easy, but slow and memory-hungry.
Indexing (usize) Very Low (just stores integers) Very Low (just copies integers) Poor. The metric isn't self-contained; you always need the GooseAttack struct to get the name.
Arc<str> Very Low (stores a single pointer) Very Low (a pointer copy and atomic increment) Excellent. The metric is self-contained and easy to pass around, even between threads.

Why Not Cow?

Cow (Clone-on-Write) is the wrong tool for this specific problem because it's designed for optional ownership, not shared ownership.

Using Cow::Borrowed would look like this:

// This is what you would have to do with Cow
pub struct GooseRequestMetric<'a> { // <-- Unwanted lifetime 'a
    pub scenario_name: Cow<'a, str>, // tied to the lifetime of GooseAttack
    // ...
}

You would re-introduce the very thing you want to get rid of: a lifetime ('a) that ties your metric struct to the GooseAttack struct. This makes it hard to store the metrics or send them to another thread for processing.

Arc<str> has no lifetime parameter because it manages the life of the data itself through reference counting. It's the perfect tool when you want to share read-only data cheaply and easily. 👍

LionsAd avatar Aug 25 '25 11:08 LionsAd

As a first step, I've gone from Cow -> Arc. I'd like to load test this change first, and confirm that then going to ArcStr is a significant win.

jeremyandrews avatar Aug 26 '25 12:08 jeremyandrews