clickhouse-go
clickhouse-go copied to clipboard
High memory consumption INSERTing Decimal type
Observed
Hi, I am seeing a big memory usage from the resultFromStatement after making a INSERT query.
I believe the leak is happening here: https://github.com/ClickHouse/clickhouse-go/blob/28fd6a4954a5dbf09b7dcc0fcce597beb2dd0b58/lib/column/decimal.go#L238
See graph below.
This happens when I insert millions of rows with Decimal & Int64 values.
The decimal values are made using github.com/shopspring/decimal, with decimal.NewFromString.
I am actually not using any of the result since I am making an INSERT. Not sure why it's appending from result and taking so much memory.
Here is the pprof output:
$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/fritz/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Type: inuse_space
Time: May 8, 2024 at 9:45am (-03)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 472.25MB, 97.42% of 484.75MB total
Dropped 52 nodes (cum <= 2.42MB)
Showing top 20 nodes out of 28
flat flat% sum% cum cum%
296.42MB 61.15% 61.15% 296.42MB 61.15% github.com/ClickHouse/ch-go/proto.(*ColDecimal128).Append (inline)
138.53MB 28.58% 89.73% 138.53MB 28.58% github.com/ClickHouse/ch-go/proto.(*ColInt64).Append (inline)
33.01MB 6.81% 96.54% 33.01MB 6.81% github.com/ClickHouse/ch-go/proto.(*ColUInt8).Append (inline)
Expected behaviour
Should not take up much memory after an INSERT call.
Code example
See https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/database/database_clickhouse.go#L157
insertStatement := conn.GenerateInsertStatement(
table.FullName(),
insFields,
1,
)
stmt, err := conn.Prepare(insertStatement)
if err != nil {
g.Trace("%s: %#v", table, columns.Names())
return g.Error(err, "could not prepare statement")
}
decimalCols := []int{}
intCols := []int{}
int64Cols := []int{}
floatCols := []int{}
for row := range batch.Rows {
var eG g.ErrorGroup
// set decimals correctly
for _, colI := range decimalCols {
if row[colI] != nil {
val, err := decimal.NewFromString(cast.ToString(row[colI]))
if err == nil {
row[colI] = val
}
eG.Capture(err)
}
}
// set Int32 correctly
for _, colI := range intCols {
if row[colI] != nil {
row[colI], err = cast.ToIntE(row[colI])
eG.Capture(err)
}
}
// set Int64 correctly
for _, colI := range int64Cols {
if row[colI] != nil {
row[colI], err = cast.ToInt64E(row[colI])
eG.Capture(err)
}
}
// set Float64 correctly
for _, colI := range floatCols {
if row[colI] != nil {
row[colI], err = cast.ToFloat64E(row[colI])
eG.Capture(err)
}
}
if err = eG.Err(); err != nil {
err = g.Error(err, "could not convert value for COPY into table %s", tableFName)
ds.Context.CaptureErr(err)
return err
}
count++
// Do insert
ds.Context.Lock()
_, err := stmt.Exec(row...)
ds.Context.Unlock()
if err != nil {
ds.Context.CaptureErr(g.Error(err, "could not COPY into table %s", tableFName))
g.Trace("error for row: %#v", row)
return g.Error(err, "could not execute statement")
}
}
Environment
- [x]
clickhouse-goversion: v2.24.0 - [x] Interface:
database/sql - [X] Go version: 1.21.1
- [X] Operating system: mac
- [X] ClickHouse version:
yandex/clickhouse-server:21.3 - [X] Is it a ClickHouse Cloud? No
- [X]
CREATE TABLEstatements for tables involved:
create table `default`.`tpcds_store_sales_tmp` (`ss_sold_date_sk` Nullable(Int64),
`ss_sold_time_sk` Nullable(Int64),
`ss_item_sk` Nullable(Int64),
`ss_customer_sk` Nullable(Int64),
`ss_cdemo_sk` Nullable(Int64),
`ss_hdemo_sk` Nullable(Int64),
`ss_addr_sk` Nullable(Int64),
`ss_store_sk` Nullable(Int64),
`ss_promo_sk` Nullable(Int64),
`ss_ticket_number` Nullable(Int64),
`ss_quantity` Nullable(Int64),
`ss_wholesale_cost` Nullable(Decimal(24,6)),
`ss_list_price` Nullable(Decimal(24,6)),
`ss_sales_price` Nullable(Decimal(24,6)),
`ss_ext_discount_amt` Nullable(Decimal(24,6)),
`ss_ext_sales_price` Nullable(Decimal(24,6)),
`ss_ext_wholesale_cost` Nullable(Decimal(24,6)),
`ss_ext_list_price` Nullable(Decimal(24,6)),
`ss_ext_tax` Nullable(Decimal(24,6)),
`ss_coupon_amt` Nullable(Decimal(24,6)),
`ss_net_paid` Nullable(Decimal(24,6)),
`ss_net_paid_inc_tax` Nullable(Decimal(24,6)),
`ss_net_profit` Nullable(Decimal(24,6))) engine=MergeTree ORDER BY tuple()
@jkaflik any timeframe on this?
@flarco, thanks for reporting this. There is definitely some unexpected behavior here. Could you clarify on:
I am actually not using any of the result since I am making an INSERT.
please?
@alisman initially assigned me to triage. Currently, we don't have the capacity to have a look shortly. Any contributions are welcome.
Limiting the number of rows inserted per transaction helps a lot with lowering the memory usage. So that works as a work-around.
The issue stands, though: if millions of rows are inserted in 1 transaction, the memory leakage crashes the process.
Same thing for us. We are inserting around 1 mln records per 1 insert (we can't decrease it unfortunately, too much data) and memory consumption is enourmous.
@gofort is your case also related to Decimal type? Could you provide a bit of details:
- number of rows insert per batch
- target data types
- Go data types
@flarco @alisman could you confirm your request is strictly related to Decimal type? What is number of rows?
@jkaflik I'll answer you instead of @gofort
We insert 1M records, each record has 52 columns.
Here is the structure we pass to method batch.AppendStruct(item)
type Item struct {
FieldA time.Time `ch:"field_a"`
FieldB time.Time `ch:"field_b"`
FieldC net.IP `ch:"field_c"`
FieldD string `ch:"field_d"`
FieldE string `ch:"field_e"`
FieldF string `ch:"field_f"`
FieldG string `ch:"field_g"`
FieldH string `ch:"field_h"`
FieldI uint16 `ch:"field_i"`
FieldJ int64 `ch:"field_j"`
FieldK string `ch:"field_k"`
FieldL string `ch:"field_l"`
FieldM int64 `ch:"field_m"`
FieldN string `ch:"field_n"`
FieldO uint32 `ch:"field_o"`
FieldP string `ch:"field_p"`
FieldQ []uint32 `ch:"field_q"`
FieldR []int64 `ch:"field_r"`
FieldS string `ch:"field_s"`
FieldT []uint16 `ch:"field_t"`
FieldU []uint32 `ch:"field_u"`
FieldV []uint32 `ch:"field_v"`
FieldW int32 `ch:"field_w"`
FieldX int32 `ch:"field_x"`
FieldY string `ch:"field_y"`
FieldZ net.IP `ch:"field_z"`
FieldAA string `ch:"field_aa"`
FieldAB string `ch:"field_ab"`
FieldAC string `ch:"field_ac"`
FieldAD uint32 `ch:"field_ad"`
FieldAE string `ch:"field_ae"`
FieldAF string `ch:"field_af"`
FieldAG string `ch:"field_ag"`
FieldAH string `ch:"field_ah"`
FieldAI string `ch:"field_ai"`
FieldAJ string `ch:"field_aj"`
FieldAK string `ch:"field_ak"`
FieldAL string `ch:"field_al"`
FieldAM string `ch:"field_am"`
FieldAN string `ch:"field_an"`
FieldAO uint8 `ch:"field_ao"`
FieldAP string `ch:"field_ap"`
FieldAQ []net.IP `ch:"field_aq"`
FieldAR uint64 `ch:"field_ar"`
FieldAS string `ch:"field_as"`
FieldAT uint32 `ch:"field_at"`
FieldAU uint32 `ch:"field_au"`
FieldAV string `ch:"field_av"`
FieldAW uint16 `ch:"field_aw"`
FieldAX uint16 `ch:"field_ax"`
FieldAY int8 `ch:"field_ay"`
FieldAZ string `ch:"field_az"`
}
The table in the database contains the following columns:
`field_a` DateTime('UTC'),
`field_b` Date,
`field_c` IPv6,
`field_d` LowCardinality(String),
`field_e` String,
`field_f` String,
`field_g` LowCardinality(String),
`field_h` LowCardinality(String),
`field_i` UInt16,
`field_j` Int64,
`field_k` String,
`field_l` String,
`field_m` Int64,
`field_n` String,
`field_o` UInt32,
`field_p` LowCardinality(String),
`field_q` Array(UInt32),
`field_r` Array(Int64),
`field_s` String,
`field_t` Array(UInt16),
`field_u` Array(UInt32),
`field_v` Array(UInt32),
`field_w` Int32,
`field_x` Int32,
`field_y` LowCardinality(String),
`field_z` IPv6,
`field_aa` String,
`field_ab` LowCardinality(String),
`field_ac` LowCardinality(String),
`field_ad` Nullable(UInt32),
`field_ae` LowCardinality(String),
`field_af` LowCardinality(String),
`field_ag` String,
`field_ah` LowCardinality(String),
`field_ai` LowCardinality(String),
`field_aj` LowCardinality(String),
`field_ak` LowCardinality(String),
`field_al` LowCardinality(String),
`field_am` LowCardinality(String),
`field_an` LowCardinality(String),
`field_ao` UInt8,
`field_ap` LowCardinality(String),
`field_aq` Array(IPv6),
`field_ar` UInt64,
`field_as` LowCardinality(String),
`field_at` UInt32,
`field_au` UInt32,
`field_av` LowCardinality(String),
`field_aw` Nullable(UInt16),
`field_ax` Nullable(UInt32),
`field_ay` Int8,
`field_az` LowCardinality(String)
@vkazmirchuk what is memory usage you notice? Do you have pprof memory report? I strongly recommend to create another issue. I want this one to be strictly focused on Decimal type support as originally mentioned by issue creator.
Thanks! I created issue https://github.com/ClickHouse/clickhouse-go/issues/1384