ch-go icon indicating copy to clipboard operation
ch-go copied to clipboard

Big memory usage when inserting large strings to Clickhouse

Open ilevd opened this issue 4 months ago • 1 comments

It seems that the client consumes a lot of memory when inserting large strings to Clickhouse. Here are steps to demonstrate.

  1. Create 2000 strings with 100000 length, so totally it is about 200Mb.
  2. Run /usr/bin/time -v ./test to measure memory usage. On my PC: Maximum resident set size (kbytes): 253960 That seems ok.
  3. Uncomment inserting, build and run again /usr/bin/time -v ./test. Now on my PC: Maximum resident set size (kbytes): 1140576 More than 1.1 Gb.
  4. Check table in Clickhouse for uncompressed table size: test table 190.74 MiB

Clickhouse table:

CREATE TABLE IF NOT EXISTS test.table
(
    a String
)
ENGINE = MergeTree()
ORDER BY a

Checking table size:

SELECT
    database,
    table,
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size
FROM system.parts
WHERE active = 1 AND database = 'test' AND table = 'table'
GROUP BY
    database,
    table
ORDER BY sum(data_uncompressed_bytes) DESC;

test.go

package main

import (
	"context"
	"fmt"
	"log"
	"math/rand"
	"time"

	"github.com/ClickHouse/ch-go"
	"github.com/ClickHouse/ch-go/proto"
)

var r = rand.New(rand.NewSource(time.Now().UnixNano()))

func getChar() int {
	return r.Intn(26) + int('a')
}

func makeString(size int) string {
	b := make([]byte, size)
	for i := range size {
		b[i] = byte(getChar())
	}
	return string(b)
}

func makeStrings(count int, strLen int) []string {
	strs := make([]string, count)
	for i := range count {
		strs[i] = makeString(strLen)
	}
	return strs
}

func insertRows(rows []string) {
	ctx := context.Background()
	conn, err := ch.Dial(ctx, ch.Options{
		Address: "localhost:9000"})
	if err != nil {
		log.Fatal("Cannot connect to Clickhouse")
	}
	var a proto.ColStr
	for _, row := range rows {
		a.Append(row)
	}
	input := proto.Input{
		{Name: "a", Data: a},
	}
	err = conn.Do(context.Background(), ch.Query{
		Body:  "INSERT INTO test.table VALUES",
		Input: input,
	})
	if err != nil {
		log.Println("Cannot write to Clickhouse", err)
	}
	conn.Close()
}

func main() {
	count := 2000
	strLen := 100000
	strs := makeStrings(count, strLen) // Create strings (~ 200Mb)
	fmt.Println(len(strs), len(strs[0]))
	fmt.Println("Insert to Clickhouse...") // Testing with and without inserting
	//insertRows(strs)
}

ilevd avatar Aug 21 '25 15:08 ilevd

Hello! Thanks for providing details and sample code.

ch-go is able to re-use the buffer for block encoding and for the string column data. If I had to guess, it could be something to do with re-sizing the string and block buffers. If you preallocate these to the expected size, then it won't have to do it automatically multiple times. I see the strings are initialized and then appended. When they're appended they get put into a different buffer. I'm not sure what Go is doing under the hood here, but you can get a better idea by using pprof. It all depends on when the garbage gets collected and when the memory gets freed from the unused strings/old buffers.

For large strings I would expect it to exist at 3 points, depending on when garbage is collected:

  • Once when the string is allocated
  • Another allocation to size the string buffer
  • A third allocation to size the encoding buffer If the buffers aren't preallocated then it might hit multiple resizing steps depending on how Go decides to size those byte slices.

I could be wrong, there might be some code I forgot, but these are my initial thoughts. What do you think?

SpencerTorres avatar Aug 21 '25 20:08 SpencerTorres