fasthttp icon indicating copy to clipboard operation
fasthttp copied to clipboard

Memory leak under Slowloris attack

Open ghostdevxd opened this issue 6 months ago • 39 comments

Version info:

  • Go: go1.23.10 windows/amd64
  • Fasthttp: v1.63.0

What happened

When running a simple fasthttp server and testing it using a Slowloris attack (via goloris), the memory usage of the fasthttp process increases rapidly and does not decrease for many minutes even after the attack ends.

Code

package main

import (
	"github.com/valyala/fasthttp"
)

func main() {
	requestHandler := func(ctx *fasthttp.RequestCtx) {
		switch string(ctx.Path()) {
		case "/":
			ctx.SetStatusCode(fasthttp.StatusOK)
			ctx.SetBody([]byte("Hello, World!"))
		default:
			ctx.SetStatusCode(fasthttp.StatusNotFound)
			ctx.SetBody([]byte(""))
		}
	}

	if err := fasthttp.ListenAndServe(":3000", requestHandler); err != nil {
		panic("Server Error: " + err.Error())
	}
}

Task Manager

Image

PPROF

flat flat% sum% cum cum% Function
231.30MB 98.93% 98.93% 231.30MB 98.93% github.com/valyala/fasthttp.appendBodyFixedSize
0 0% 98.93% 231.30MB 98.93% github.com/valyala/fasthttp.(*Request).ContinueReadBody
0 0% 98.93% 231.30MB 98.93% github.com/valyala/fasthttp.(*Request).ReadBody
0 0% 98.93% 231.30MB 98.93% github.com/valyala/fasthttp.(*Request).readLimitBody
0 0% 98.93% 232.81MB 99.57% github.com/valyala/fasthttp.(*Server).serveConn
0 0% 98.93% 232.81MB 99.57% github.com/valyala/fasthttp.(*workerPool).getCh.func1
0 0% 98.93% 232.81MB 99.57% github.com/valyala/fasthttp.(*workerPool).workerFunc
0 0% 98.93% 231.30MB 98.93% github.com/valyala/fasthttp.readBody

ghostdevxd avatar Jul 13 '25 06:07 ghostdevxd

That makes sense. The defaults for fasthttp are not about being safe, they are about being fast.

If you want to prevent something like this you should set Server.MaxConnsPerIP and Server.ReadTimeout.

erikdubbelboer avatar Jul 14 '25 08:07 erikdubbelboer

That makes sense. The defaults for fasthttp are not about being safe, they are about being fast.

If you want to prevent something like this you should set Server.MaxConnsPerIP and Server.ReadTimeout.

IdleTimeout, ReadTimeout etc. settings do not work, ram usage suddenly increases and crashes frameworks such as net/http, silverlinining (as fast as fasthttp) do not have this problem

ghostdevxd avatar Jul 14 '25 18:07 ghostdevxd

Did you test it with ReadTimeout? It doesn't get reset between reads, so it should prevent a client from sending one byte at a time to keep a connection open for long. That combined with MaxConnsPerIP should completely stop the attack.

erikdubbelboer avatar Jul 15 '25 06:07 erikdubbelboer

If you're only concerned about memory. That is normal, fasthttp will reuse memory and keep buffers for that even if connections have been closed. Memory usage will only go down after the system runs our of memory and Go starts releasing the memory.

erikdubbelboer avatar Jul 15 '25 06:07 erikdubbelboer

Read Timeout (3 * time.Second):

Goloris Error:

2025/07/15 10:28:52 Unexpected response read from the server: [HTTP/1.1 408 Request Timeout
Server: fasthttp
Date: Tue, 15 Jul 2025 07:28:51 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 15
Connection: close

Request timeout]
  • Despite the ReadTimeout error, the attack can still continue.
  • Can still memory overflow by opening more connections

MaxConnsPerIP:

  • Can be bypassed using proxy

ghostdevxd avatar Jul 15 '25 07:07 ghostdevxd

Stage Connections RAM Usage
1 3000 2 GB
2 3000 2 GB
3 6000 4 GB

it is possible to achieve this attack again by increasing the number of connections

ghostdevxd avatar Jul 15 '25 07:07 ghostdevxd


import (
	"fmt"
	"os"
	"time"

	"github.com/gofiber/fiber/v2"
)

func main() {
	if len(os.Args) < 2 {
		fmt.Println("Usage: ./webserver <IP>")
		os.Exit(1)
	}
	ip := os.Args[1]
	app := fiber.New(fiber.Config{
		DisableStartupMessage:    false,
		DisableHeaderNormalizing: true,
		IdleTimeout:              5 * time.Second,
		Concurrency:              999999999,
		ReadTimeout:              6 * time.Second,
		WriteTimeout:             6 * time.Second,
		ReadBufferSize:           18024,
		WriteBufferSize:          18024,
	})

	app.Get("/", func(c *fiber.Ctx) error {
		return c.SendString("Hello, World!")
	})

	addr := fmt.Sprintf("%s:8000", ip)
	if err := app.Listen(addr); err != nil {
		fmt.Printf("Error starting server: %v\n", err)
		os.Exit(1)
	}
}

i tried go fiber as i saw you actually post this problem in the gofiber discord but setting the read and write timeouts 100% stopped it and for your proxy problem thats just straight up a ddos attack so you would of course need a firewall in order to stop that and limit connections per ip etc, in this case it would be user error and nothing to do with fasthttp

guno1928 avatar Jul 19 '25 09:07 guno1928


import (
	"fmt"
	"os"
	"time"

	"github.com/gofiber/fiber/v2"
)

func main() {
	if len(os.Args) < 2 {
		fmt.Println("Usage: ./webserver <IP>")
		os.Exit(1)
	}
	ip := os.Args[1]
	app := fiber.New(fiber.Config{
		DisableStartupMessage:    false,
		DisableHeaderNormalizing: true,
		IdleTimeout:              5 * time.Second,
		Concurrency:              999999999,
		ReadTimeout:              6 * time.Second,
		WriteTimeout:             6 * time.Second,
		ReadBufferSize:           18024,
		WriteBufferSize:          18024,
	})

	app.Get("/", func(c *fiber.Ctx) error {
		return c.SendString("Hello, World!")
	})

	addr := fmt.Sprintf("%s:8000", ip)
	if err := app.Listen(addr); err != nil {
		fmt.Printf("Error starting server: %v\n", err)
		os.Exit(1)
	}
}

i tried go fiber as i saw you actually post this problem in the gofiber discord but setting the read and write timeouts 100% stopped it and for your proxy problem thats just straight up a ddos attack so you would of course need a firewall in order to stop that and limit connections per ip etc, in this case it would be user error and nothing to do with fasthttp

didn't work the problem persists FastHTTP: Image ReadTimeout etc.. imposes a certain limit, but slowloris still overflows ram when you increase the number of connections

and the whole problem is caused by fasthttp, silverlining (is fast like fasthttp) and other slow frameworks do not have this problem

ghostdevxd avatar Jul 19 '25 10:07 ghostdevxd

Related

  • https://github.com/valyala/fasthttp/issues/667
  • #2032

gaby avatar Jul 19 '25 12:07 gaby

@ghostdevxd Try setting max body limit:

srv := &fasthttp.Server{
    // refuse anything above 1 MiB
    MaxRequestBodySize: 1 << 20,
}

gaby avatar Jul 19 '25 12:07 gaby

@ghostdevxd Try setting max body limit:

srv := &fasthttp.Server{ // refuse anything above 1 MiB MaxRequestBodySize: 1 << 20, }

it doesn't make sense to limit it, the problem persists, it only makes the attack more difficult

ghostdevxd avatar Jul 19 '25 12:07 ghostdevxd

@ghostdevxd Try setting max body limit: srv := &fasthttp.Server{ // refuse anything above 1 MiB MaxRequestBodySize: 1 << 20, }

it doesn't make sense to limit it, the problem persists, it only makes the attack more difficult

That's what I wanted to confirm, that it makes it more difficult. The default limit is 4MB.

gaby avatar Jul 19 '25 13:07 gaby

Can you show exactly which command line flags you use with goloris? I want to see if I can replicate this and see what is going on.

erikdubbelboer avatar Jul 19 '25 14:07 erikdubbelboer

@erikdubbelboer I asked OpenAI o3 model about this and it suggested this:

Problem:

bodyBuf.B, err = readBody(r, contentLength, maxBodySize, bodyBuf.B)

Will allocate the content length, so it suggested changing appendBodyFixedSized to the following:

func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) {
    const step = 32 * 1024            // grow in small chunks
    for read := 0; read < n; {
        need := n - read
        if need > step { need = step }

        // ensure capacity only for the incoming chunk
        if len(dst)+need > cap(dst) {
            dst = append(dst, make([]byte, need)...)
        } else {
            dst = dst[:len(dst)+need]
        }

        if _, err := io.ReadFull(r, dst[len(dst)-need:]); err != nil {
            return dst[:len(dst)-need], err       // early EOF, etc.
        }
        read += need
    }
    return dst, nil
}

It's a solution, although probably not the best one to be allocating so often

gaby avatar Jul 19 '25 15:07 gaby

Can you show exactly which command line flags you use with goloris? I want to see if I can replicate this and see what is going on.

go run goloris.go -dialWorkersCount 150

It auto presets to local host and once you run this the ram will build up to around 100-150 mb you can than stop it wait 5 seconds start it back up and the ram will grow again and repeat or open 3 terminals and do it at the same time and it grows faster

guno1928 avatar Jul 19 '25 20:07 guno1928

This program runs a fasthttp server and runs goloris over and over again to see if it increases memory usage:

package main

import (
	"context"
	"fmt"
	"os"
	"os/exec"
	"os/signal"
	"runtime"
	"syscall"
	"time"

	"github.com/valyala/fasthttp"
)

func main() {
	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, os.Kill)
	defer cancel()

	requestHandler := func(ctx *fasthttp.RequestCtx) {
		ctx.SetStatusCode(fasthttp.StatusOK)
	}

	s := &fasthttp.Server{
		Handler:      requestHandler,
		ReadTimeout:  3 * time.Second,
		WriteTimeout: 3 * time.Second,
	}

	go func() {
		for {
			time.Sleep(time.Second)

			var m runtime.MemStats
			runtime.ReadMemStats(&m)
			allocated := m.Alloc / 1024
			fmt.Printf("memory: %dkb  connections: %d\n", allocated, s.GetOpenConnectionsCount())
		}
	}()

	go func() {
		if err := s.ListenAndServe(":3000"); err != nil {
			panic(err)
		}
	}()

	go func() {
		<-ctx.Done()

		ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
		defer cancel()
		if err := s.ShutdownWithContext(ctx); err != nil {
			panic(err)
		}
	}()

	if err := os.Chdir(os.Getenv("GOPATH") + "/src/github.com/valyala/goloris"); err != nil {
		panic(err)
	}

	for {
		goloris := exec.Command("go", "run", "goloris.go", "-victimUrl", "http://localhost:3000", "-dialWorkersCount", "150") //nolint:lll
		goloris.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
		//goloris.Stdout = os.Stdout
		//goloris.Stderr = os.Stderr
		if err := goloris.Start(); err != nil {
			panic(err)
		}
		time.Sleep(time.Second * 6)
		if err := syscall.Kill(-goloris.Process.Pid, syscall.SIGKILL); err != nil {
			panic(err)
		}

		select {
		case <-time.After(5 * time.Second):
		case <-ctx.Done():
			return
		}
	}
}

When i run this I get:

% go run main.go
memory: 249kb  connections: 0
memory: 129121kb  connections: 125
memory: 283756kb  connections: 275
memory: 438380kb  connections: 425
memory: 466407kb  connections: 450
memory: 466673kb  connections: 407
memory: 467086kb  connections: 0
memory: 467086kb  connections: 0
memory: 467086kb  connections: 0
memory: 467087kb  connections: 0
memory: 467094kb  connections: 0
memory: 467095kb  connections: 0
memory: 467130kb  connections: 121
memory: 467175kb  connections: 271
memory: 467221kb  connections: 421
memory: 476576kb  connections: 450
memory: 476693kb  connections: 406
memory: 476951kb  connections: 0
memory: 476951kb  connections: 0
memory: 476951kb  connections: 0
memory: 476951kb  connections: 0
memory: 476959kb  connections: 0
memory: 476959kb  connections: 0
memory: 476994kb  connections: 120
memory: 477037kb  connections: 269
memory: 477085kb  connections: 419
memory: 477167kb  connections: 449
memory: 477259kb  connections: 449
memory: 477521kb  connections: 0
memory: 477521kb  connections: 0
memory: 477521kb  connections: 0
memory: 477521kb  connections: 0
memory: 477529kb  connections: 0
memory: 477529kb  connections: 0
memory: 477564kb  connections: 119
memory: 477607kb  connections: 269
memory: 477651kb  connections: 419
memory: 477732kb  connections: 449
memory: 478032kb  connections: 94
memory: 478091kb  connections: 0
memory: 478091kb  connections: 0
memory: 478091kb  connections: 0
memory: 478091kb  connections: 0
memory: 478099kb  connections: 0
memory: 478099kb  connections: 0
memory: 478135kb  connections: 123
memory: 478178kb  connections: 273
memory: 478222kb  connections: 423
memory: 478305kb  connections: 449
memory: 478617kb  connections: 39
memory: 478645kb  connections: 0
memory: 478645kb  connections: 0
memory: 478645kb  connections: 0
memory: 478645kb  connections: 0
memory: 478653kb  connections: 0
memory: 478653kb  connections: 0
memory: 478689kb  connections: 124
memory: 478733kb  connections: 274
memory: 478776kb  connections: 424
memory: 478860kb  connections: 448

As expected the memory is being reused as it should and doesn't grow.

@ghostdevxd or @guno1928 can you see if you can modify this program to replicate the conditions you are seeing where memory keeps growing?

erikdubbelboer avatar Jul 20 '25 03:07 erikdubbelboer

This program runs a fasthttp server and runs goloris over and over again to see if it increases memory usage:

package main

import ( "context" "fmt" "os" "os/exec" "os/signal" "runtime" "syscall" "time"

"github.com/valyala/fasthttp" )

func main() { ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, os.Kill) defer cancel()

requestHandler := func(ctx *fasthttp.RequestCtx) { ctx.SetStatusCode(fasthttp.StatusOK) }

s := &fasthttp.Server{ Handler: requestHandler, ReadTimeout: 3 * time.Second, WriteTimeout: 3 * time.Second, }

go func() { for { time.Sleep(time.Second)

  	var m runtime.MemStats
  	runtime.ReadMemStats(&m)
  	allocated := m.Alloc / 1024
  	fmt.Printf("memory: %dkb  connections: %d\n", allocated, s.GetOpenConnectionsCount())
  }

}()

go func() { if err := s.ListenAndServe(":3000"); err != nil { panic(err) } }()

go func() { <-ctx.Done()

  ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
  defer cancel()
  if err := s.ShutdownWithContext(ctx); err != nil {
  	panic(err)
  }

}()

if err := os.Chdir(os.Getenv("GOPATH") + "/src/github.com/valyala/goloris"); err != nil { panic(err) }

for { goloris := exec.Command("go", "run", "goloris.go", "-victimUrl", "http://localhost:3000", "-dialWorkersCount", "150") //nolint:lll goloris.SysProcAttr = &syscall.SysProcAttr{Setpgid: true} //goloris.Stdout = os.Stdout //goloris.Stderr = os.Stderr if err := goloris.Start(); err != nil { panic(err) } time.Sleep(time.Second * 6) if err := syscall.Kill(-goloris.Process.Pid, syscall.SIGKILL); err != nil { panic(err) }

  select {
  case <-time.After(5 * time.Second):
  case <-ctx.Done():
  	return
  }

} }

When i run this I get:

% go run main.go memory: 249kb connections: 0 memory: 129121kb connections: 125 memory: 283756kb connections: 275 memory: 438380kb connections: 425 memory: 466407kb connections: 450 memory: 466673kb connections: 407 memory: 467086kb connections: 0 memory: 467086kb connections: 0 memory: 467086kb connections: 0 memory: 467087kb connections: 0 memory: 467094kb connections: 0 memory: 467095kb connections: 0 memory: 467130kb connections: 121 memory: 467175kb connections: 271 memory: 467221kb connections: 421 memory: 476576kb connections: 450 memory: 476693kb connections: 406 memory: 476951kb connections: 0 memory: 476951kb connections: 0 memory: 476951kb connections: 0 memory: 476951kb connections: 0 memory: 476959kb connections: 0 memory: 476959kb connections: 0 memory: 476994kb connections: 120 memory: 477037kb connections: 269 memory: 477085kb connections: 419 memory: 477167kb connections: 449 memory: 477259kb connections: 449 memory: 477521kb connections: 0 memory: 477521kb connections: 0 memory: 477521kb connections: 0 memory: 477521kb connections: 0 memory: 477529kb connections: 0 memory: 477529kb connections: 0 memory: 477564kb connections: 119 memory: 477607kb connections: 269 memory: 477651kb connections: 419 memory: 477732kb connections: 449 memory: 478032kb connections: 94 memory: 478091kb connections: 0 memory: 478091kb connections: 0 memory: 478091kb connections: 0 memory: 478091kb connections: 0 memory: 478099kb connections: 0 memory: 478099kb connections: 0 memory: 478135kb connections: 123 memory: 478178kb connections: 273 memory: 478222kb connections: 423 memory: 478305kb connections: 449 memory: 478617kb connections: 39 memory: 478645kb connections: 0 memory: 478645kb connections: 0 memory: 478645kb connections: 0 memory: 478645kb connections: 0 memory: 478653kb connections: 0 memory: 478653kb connections: 0 memory: 478689kb connections: 124 memory: 478733kb connections: 274 memory: 478776kb connections: 424 memory: 478860kb connections: 448

As expected the memory is being reused as it should and doesn't grow.

@ghostdevxd or @guno1928 can you see if you can modify this program to replicate the conditions you are seeing where memory keeps growing?

Image

i ran your code and it shoots up to 4.8 gigs of ram and just sits there, your code does not increase past that

guno1928 avatar Jul 20 '25 03:07 guno1928

@guno1928 can you paste a longer output here? In your screenshot the memory seems quite stable.

erikdubbelboer avatar Jul 20 '25 03:07 erikdubbelboer

@guno1928 can you paste a longer output here? In your screenshot the memory seems quite stable.

hello sorry i did more testing and it sits at a stable 4.8 gigs and does not go above that

guno1928 avatar Jul 20 '25 03:07 guno1928

@erikdubbelboer i tried gabbys potential fix

func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) {
    const step = 32 * 1024            // grow in small chunks
    for read := 0; read < n; {
        need := n - read
        if need > step { need = step }

        // ensure capacity only for the incoming chunk
        if len(dst)+need > cap(dst) {
            dst = append(dst, make([]byte, need)...)
        } else {
            dst = dst[:len(dst)+need]
        }

        if _, err := io.ReadFull(r, dst[len(dst)-need:]); err != nil {
            return dst[:len(dst)-need], err       // early EOF, etc.
        }
        read += need
    }
    return dst, nil
}

and got very good results with ram going up way slower instead of instantly shooting upto 4 gigs up ram

Image

this is with gaby idea and running at dial worker count of 6k

guno1928 avatar Jul 20 '25 03:07 guno1928

You're trading memory for CPU there. This solution uses more CPU and generates more garbage (which also uses more CPU).

Fasthttp always makes the tradeoff of using more memory to reduce CPU.

erikdubbelboer avatar Jul 20 '25 04:07 erikdubbelboer

You're trading memory for CPU there. This solution uses more CPU and generates more garbage (which also uses more CPU).

Fasthttp always makes the tradeoff of using more memory to reduce CPU.

Image

^^ with no fix

Image

^^ with gaby fix but using const step = 64 * 1024 not const step = 32 * 1024

both ways tested with

go run main.go -victimUrl http://127.0.0.1:8000 -dialWorkersCount 400

guno1928 avatar Jul 20 '25 04:07 guno1928

@guno1928 Can you try this one:

// appendBodyFixedSize reads exactly n bytes from r and appends them to dst.
//
// For bodies ≤ 64 KiB it behaves exactly like the old code (single allocation).
// For larger bodies it caps the first allocation at 1 MiB and then doubles
// capacity when needed, so up-front RAM is bounded while keeping re-allocs ≈ log₂.
//
// Tunables can be promoted to package vars if users need different trade-offs.
func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) {
	const (
		smallBody    = 64 << 10  // ≤64 KiB → old fast-path
		maxFirst     = 1 << 20   // never allocate more than 1 MiB up-front
		chunk        = 32 << 10  // bytes read per loop iteration
		growthFactor = 2         // slice cap multiplier
	)

	if n <= 0 {
		return dst, nil
	}

	/* ---------- fast-path: unchanged behaviour ---------- */
	if n <= smallBody {
		total := len(dst) + n
		if cap(dst) < total {
			b := make([]byte, roundUpForSliceCap(total))
			copy(b, dst)
			dst = b
		}
		dst = dst[:total]
		_, err := io.ReadFull(r, dst[len(dst)-n:])
		return dst, err
	}

	// make sure we have at most 1 MiB more than we already had
	want := len(dst) + maxFirst
	if want > len(dst)+n {
		want = len(dst) + n
	}
	if cap(dst) < want {
		b := make([]byte, want)
		copy(b, dst)
		dst = b[:len(dst)]
	}

	remain := n
	for remain > 0 {
		step := chunk
		if remain < step {
			step = remain
		}

		// Ensure capacity (geometric growth keeps realloc-count ≈ log₂).
		if cap(dst)-len(dst) < step {
			need := len(dst) + step
			newCap := cap(dst) * growthFactor
			if newCap < need {
				newCap = roundUpForSliceCap(need)
			}
			b := make([]byte, newCap)
			copy(b, dst)
			dst = b[:len(dst)]
		}

		dst = dst[:len(dst)+step]
		if _, err := io.ReadFull(r, dst[len(dst)-step:]); err != nil {
			// return everything read so far
			return dst[:len(dst)-step], err
		}
		remain -= step
	}
	return dst, nil
}

gaby avatar Jul 20 '25 04:07 gaby

@guno1928 Can you try this one:

// appendBodyFixedSize reads exactly n bytes from r and appends them to dst. // // For bodies ≤ 64 KiB it behaves exactly like the old code (single allocation). // For larger bodies it caps the first allocation at 1 MiB and then doubles // capacity when needed, so up-front RAM is bounded while keeping re-allocs ≈ log₂. // // Tunables can be promoted to package vars if users need different trade-offs. func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) { const ( smallBody = 64 << 10 // ≤64 KiB → old fast-path maxFirst = 1 << 20 // never allocate more than 1 MiB up-front chunk = 32 << 10 // bytes read per loop iteration growthFactor = 2 // slice cap multiplier )

if n <= 0 { return dst, nil }

/* ---------- fast-path: unchanged behaviour ---------- */ if n <= smallBody { total := len(dst) + n if cap(dst) < total { b := make([]byte, roundUpForSliceCap(total)) copy(b, dst) dst = b } dst = dst[:total] _, err := io.ReadFull(r, dst[len(dst)-n:]) return dst, err }

// make sure we have at most 1 MiB more than we already had want := len(dst) + maxFirst if want > len(dst)+n { want = len(dst) + n } if cap(dst) < want { b := make([]byte, want) copy(b, dst) dst = b[:len(dst)] }

remain := n for remain > 0 { step := chunk if remain < step { step = remain }

  // Ensure capacity (geometric growth keeps realloc-count ≈ log₂).
  if cap(dst)-len(dst) < step {
  	need := len(dst) + step
  	newCap := cap(dst) * growthFactor
  	if newCap < need {
  		newCap = roundUpForSliceCap(need)
  	}
  	b := make([]byte, newCap)
  	copy(b, dst)
  	dst = b[:len(dst)]
  }

  dst = dst[:len(dst)+step]
  if _, err := io.ReadFull(r, dst[len(dst)-step:]); err != nil {
  	// return everything read so far
  	return dst[:len(dst)-step], err
  }
  remain -= step

} return dst, nil }

this still shot the ram up instantly

guno1928 avatar Jul 20 '25 04:07 guno1928

@guno1928 I got one last option:

// appendBodyFixedSize reads exactly n bytes from r and appends them to dst.
//
//   • 0 B allocated before we receive the first byte.
//   • Body ≤ 64 KiB → old single-allocation fast-path.
//   • > 64 KiB     → grow slice in 32 KiB steps.
func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) {
    if n <= 0 {
        return dst, nil
    }

    const (
        smallBody = 64 << 10  // keep old behaviour for small JSON etc.
        chunk     = 32 << 10  // read size per iteration
    )

    /* ---------- old zero-copy path, still fastest ---------- */
    if n <= smallBody {
        total := len(dst) + n
        if cap(dst) < total {
            b := make([]byte, roundUpForSliceCap(total))
            copy(b, dst)
            dst = b
        }
        dst = dst[:total]
        _, err := io.ReadFull(r, dst[len(dst)-n:])
        return dst, err
    }

    /* ---------- incremental growth ---------- */
    remain := n
    for remain > 0 {
        step := chunk
        if remain < step {
            step = remain
        }

        // Reserve space *after* we know ReadFull will succeed.
        cur := len(dst)
        dst = append(dst, make([]byte, step)...)

        if _, err := io.ReadFull(r, dst[cur:]); err != nil {
            return dst[:cur], err           // early EOF etc.
        }
        remain -= step
    }
    return dst, nil
}

gaby avatar Jul 20 '25 04:07 gaby

@guno1928 I got one last option:

// appendBodyFixedSize reads exactly n bytes from r and appends them to dst. // // • 0 B allocated before we receive the first byte. // • Body ≤ 64 KiB → old single-allocation fast-path. // • > 64 KiB → grow slice in 32 KiB steps. func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) { if n <= 0 { return dst, nil }

const (
    smallBody = 64 << 10  // keep old behaviour for small JSON etc.
    chunk     = 32 << 10  // read size per iteration
)

/* ---------- old zero-copy path, still fastest ---------- */
if n <= smallBody {
    total := len(dst) + n
    if cap(dst) < total {
        b := make([]byte, roundUpForSliceCap(total))
        copy(b, dst)
        dst = b
    }
    dst = dst[:total]
    _, err := io.ReadFull(r, dst[len(dst)-n:])
    return dst, err
}

/* ---------- incremental growth ---------- */
remain := n
for remain > 0 {
    step := chunk
    if remain < step {
        step = remain
    }

    // Reserve space *after* we know ReadFull will succeed.
    cur := len(dst)
    dst = append(dst, make([]byte, step)...)

    if _, err := io.ReadFull(r, dst[cur:]); err != nil {
        return dst[:cur], err           // early EOF etc.
    }
    remain -= step
}
return dst, nil

}

Image

used go run main.go -victimUrl http://127.0.0.1:8000 -dialWorkersCount 400

i have tested the speed with the changes

Image

with no fix ^^

Image

^^ with gaby latest fix

guno1928 avatar Jul 20 '25 04:07 guno1928

Awesome 💪💪💪 @erikdubbelboer if it's good with you I can submit a PR

gaby avatar Jul 20 '25 05:07 gaby

Yes that sounds good. See if you can try doing this with bytebufferpool so the buffer that end up being too small still get reused.

erikdubbelboer avatar Jul 20 '25 05:07 erikdubbelboer

@guno1928 I got one last option:

// appendBodyFixedSize reads exactly n bytes from r and appends them to dst. // // • 0 B allocated before we receive the first byte. // • Body ≤ 64 KiB → old single-allocation fast-path. // • > 64 KiB → grow slice in 32 KiB steps. func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) { if n <= 0 { return dst, nil }

const (
    smallBody = 64 << 10  // keep old behaviour for small JSON etc.
    chunk     = 32 << 10  // read size per iteration
)

/* ---------- old zero-copy path, still fastest ---------- */
if n <= smallBody {
    total := len(dst) + n
    if cap(dst) < total {
        b := make([]byte, roundUpForSliceCap(total))
        copy(b, dst)
        dst = b
    }
    dst = dst[:total]
    _, err := io.ReadFull(r, dst[len(dst)-n:])
    return dst, err
}

/* ---------- incremental growth ---------- */
remain := n
for remain > 0 {
    step := chunk
    if remain < step {
        step = remain
    }

    // Reserve space *after* we know ReadFull will succeed.
    cur := len(dst)
    dst = append(dst, make([]byte, step)...)

    if _, err := io.ReadFull(r, dst[cur:]); err != nil {
        return dst[:cur], err           // early EOF etc.
    }
    remain -= step
}
return dst, nil

}

it largely prevents memory leaks

ghostdevxd avatar Jul 20 '25 09:07 ghostdevxd

@guno1928 Can you try this one:

import (
    "bufio"
    "io"

    "github.com/valyala/bytebufferpool"
)

// appendBodyFixedSize reads exactly n bytes from r and appends them to dst.
//
//   • 0 B is reserved for the body until the first byte actually arrives.
//   • Bodies ≤ 64 KiB keep the old single-allocation, zero-copy fast-path.
//   • Larger bodies grow dst with append() in 32 KiB steps.
//   • A 32 KiB scratch slice is from bytebufferpool
//     and returned via defer, so it never hits the garbage collector.
//
// Returned slice aliases the backing array of dst plus the newly-received data.
func appendBodyFixedSize(r *bufio.Reader, dst []byte, n int) ([]byte, error) {
    if n <= 0 {
        return dst, nil
    }

    const (
        smallBody = 64 << 10  // ≤64 KiB → keep original behaviour
        chunk     = 32 << 10  // size of each read and pool buffer
    )

    /* ---------- common small requests ---------- */
    if n <= smallBody {
        total := len(dst) + n
        if cap(dst) < total {
            b := make([]byte, total)      // tiny; no need for pool here
            copy(b, dst)
            dst = b
        }
        dst = dst[:total]
        _, err := io.ReadFull(r, dst[len(dst)-n:])
        return dst, err
    }

    scratch := bytebufferpool.Get()
    defer bytebufferpool.Put(scratch)

    if cap(scratch.B) < chunk {
        scratch.B = make([]byte, chunk)   // first time it may be empty
    }
    buf := scratch.B[:chunk]

    remain := n
    for remain > 0 {
        step := chunk
        if remain < step {
            step = remain
            buf = buf[:step]              // shrink for the final partial read
        }

        if _, err := io.ReadFull(r, buf); err != nil {
            return dst, err               // early EOF, timeout, etc.
        }
        dst = append(dst, buf[:step]...)  // one extra copy – acceptable
        remain -= step
    }
    return dst, nil
}

gaby avatar Jul 20 '25 11:07 gaby