clickhouse-go icon indicating copy to clipboard operation
clickhouse-go copied to clipboard

High memory consumption INSERTing Decimal type

Open flarco opened this issue 1 year ago • 8 comments

Observed

Hi, I am seeing a big memory usage from the resultFromStatement after making a INSERT query. I believe the leak is happening here: https://github.com/ClickHouse/clickhouse-go/blob/28fd6a4954a5dbf09b7dcc0fcce597beb2dd0b58/lib/column/decimal.go#L238

See graph below. This happens when I insert millions of rows with Decimal & Int64 values. The decimal values are made using github.com/shopspring/decimal, with decimal.NewFromString. I am actually not using any of the result since I am making an INSERT. Not sure why it's appending from result and taking so much memory.

Here is the pprof output:

$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/fritz/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Type: inuse_space
Time: May 8, 2024 at 9:45am (-03)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 472.25MB, 97.42% of 484.75MB total
Dropped 52 nodes (cum <= 2.42MB)
Showing top 20 nodes out of 28
      flat  flat%   sum%        cum   cum%
  296.42MB 61.15% 61.15%   296.42MB 61.15%  github.com/ClickHouse/ch-go/proto.(*ColDecimal128).Append (inline)
  138.53MB 28.58% 89.73%   138.53MB 28.58%  github.com/ClickHouse/ch-go/proto.(*ColInt64).Append (inline)
   33.01MB  6.81% 96.54%    33.01MB  6.81%  github.com/ClickHouse/ch-go/proto.(*ColUInt8).Append (inline)

image

Expected behaviour

Should not take up much memory after an INSERT call.

Code example

See https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/database/database_clickhouse.go#L157

			insertStatement := conn.GenerateInsertStatement(
				table.FullName(),
				insFields,
				1,
			)

			stmt, err := conn.Prepare(insertStatement)
			if err != nil {
				g.Trace("%s: %#v", table, columns.Names())
				return g.Error(err, "could not prepare statement")
			}

			decimalCols := []int{}
			intCols := []int{}
			int64Cols := []int{}
			floatCols := []int{}
			for row := range batch.Rows {
				var eG g.ErrorGroup

				// set decimals correctly
				for _, colI := range decimalCols {
					if row[colI] != nil {
						val, err := decimal.NewFromString(cast.ToString(row[colI]))
						if err == nil {
							row[colI] = val
						}
						eG.Capture(err)
					}
				}

				// set Int32 correctly
				for _, colI := range intCols {
					if row[colI] != nil {
						row[colI], err = cast.ToIntE(row[colI])
						eG.Capture(err)
					}
				}

				// set Int64 correctly
				for _, colI := range int64Cols {
					if row[colI] != nil {
						row[colI], err = cast.ToInt64E(row[colI])
						eG.Capture(err)
					}
				}

				// set Float64 correctly
				for _, colI := range floatCols {
					if row[colI] != nil {
						row[colI], err = cast.ToFloat64E(row[colI])
						eG.Capture(err)
					}
				}

				if err = eG.Err(); err != nil {
					err = g.Error(err, "could not convert value for COPY into table %s", tableFName)
					ds.Context.CaptureErr(err)
					return err
				}

				count++
				// Do insert
				ds.Context.Lock()
				_, err := stmt.Exec(row...)
				ds.Context.Unlock()
				if err != nil {
					ds.Context.CaptureErr(g.Error(err, "could not COPY into table %s", tableFName))
					g.Trace("error for row: %#v", row)
					return g.Error(err, "could not execute statement")
				}
			}

Environment

  • [x] clickhouse-go version: v2.24.0
  • [x] Interface: database/sql
  • [X] Go version: 1.21.1
  • [X] Operating system: mac
  • [X] ClickHouse version: yandex/clickhouse-server:21.3
  • [X] Is it a ClickHouse Cloud? No
  • [X] CREATE TABLE statements for tables involved:
create table `default`.`tpcds_store_sales_tmp` (`ss_sold_date_sk` Nullable(Int64),
`ss_sold_time_sk` Nullable(Int64),
`ss_item_sk` Nullable(Int64),
`ss_customer_sk` Nullable(Int64),
`ss_cdemo_sk` Nullable(Int64),
`ss_hdemo_sk` Nullable(Int64),
`ss_addr_sk` Nullable(Int64),
`ss_store_sk` Nullable(Int64),
`ss_promo_sk` Nullable(Int64),
`ss_ticket_number` Nullable(Int64),
`ss_quantity` Nullable(Int64),
`ss_wholesale_cost` Nullable(Decimal(24,6)),
`ss_list_price` Nullable(Decimal(24,6)),
`ss_sales_price` Nullable(Decimal(24,6)),
`ss_ext_discount_amt` Nullable(Decimal(24,6)),
`ss_ext_sales_price` Nullable(Decimal(24,6)),
`ss_ext_wholesale_cost` Nullable(Decimal(24,6)),
`ss_ext_list_price` Nullable(Decimal(24,6)),
`ss_ext_tax` Nullable(Decimal(24,6)),
`ss_coupon_amt` Nullable(Decimal(24,6)),
`ss_net_paid` Nullable(Decimal(24,6)),
`ss_net_paid_inc_tax` Nullable(Decimal(24,6)),
`ss_net_profit` Nullable(Decimal(24,6))) engine=MergeTree  ORDER BY tuple()

flarco avatar May 08 '24 12:05 flarco

@jkaflik any timeframe on this?

alisman avatar May 28 '24 15:05 alisman

@flarco, thanks for reporting this. There is definitely some unexpected behavior here. Could you clarify on:

I am actually not using any of the result since I am making an INSERT.

please?

@alisman initially assigned me to triage. Currently, we don't have the capacity to have a look shortly. Any contributions are welcome.

jkaflik avatar May 31 '24 08:05 jkaflik

Limiting the number of rows inserted per transaction helps a lot with lowering the memory usage. So that works as a work-around.

The issue stands, though: if millions of rows are inserted in 1 transaction, the memory leakage crashes the process.

flarco avatar Jun 01 '24 11:06 flarco

Same thing for us. We are inserting around 1 mln records per 1 insert (we can't decrease it unfortunately, too much data) and memory consumption is enourmous.

gofort avatar Jul 22 '24 09:07 gofort

@gofort is your case also related to Decimal type? Could you provide a bit of details:

  • number of rows insert per batch
  • target data types
  • Go data types

@flarco @alisman could you confirm your request is strictly related to Decimal type? What is number of rows?

jkaflik avatar Aug 21 '24 15:08 jkaflik

@jkaflik I'll answer you instead of @gofort

We insert 1M records, each record has 52 columns.

Here is the structure we pass to method batch.AppendStruct(item)

type Item struct {
	FieldA       time.Time `ch:"field_a"`
	FieldB       time.Time `ch:"field_b"`
	FieldC       net.IP    `ch:"field_c"`
	FieldD       string    `ch:"field_d"`
	FieldE       string    `ch:"field_e"`
	FieldF       string    `ch:"field_f"`
	FieldG       string    `ch:"field_g"`
	FieldH       string    `ch:"field_h"`
	FieldI       uint16    `ch:"field_i"`
	FieldJ       int64     `ch:"field_j"`
	FieldK       string    `ch:"field_k"`
	FieldL       string    `ch:"field_l"`
	FieldM       int64     `ch:"field_m"`
	FieldN       string    `ch:"field_n"`
	FieldO       uint32    `ch:"field_o"`
	FieldP       string    `ch:"field_p"`
	FieldQ       []uint32  `ch:"field_q"`
	FieldR       []int64   `ch:"field_r"`
	FieldS       string    `ch:"field_s"`
	FieldT       []uint16  `ch:"field_t"`
	FieldU       []uint32  `ch:"field_u"`
	FieldV       []uint32  `ch:"field_v"`
	FieldW       int32     `ch:"field_w"`
	FieldX       int32     `ch:"field_x"`
	FieldY       string    `ch:"field_y"`
	FieldZ       net.IP    `ch:"field_z"`
	FieldAA      string    `ch:"field_aa"`
	FieldAB      string    `ch:"field_ab"`
	FieldAC      string    `ch:"field_ac"`
	FieldAD      uint32    `ch:"field_ad"`
	FieldAE      string    `ch:"field_ae"`
	FieldAF      string    `ch:"field_af"`
	FieldAG      string    `ch:"field_ag"`
	FieldAH      string    `ch:"field_ah"`
	FieldAI      string    `ch:"field_ai"`
	FieldAJ      string    `ch:"field_aj"`
	FieldAK      string    `ch:"field_ak"`
	FieldAL      string    `ch:"field_al"`
	FieldAM      string    `ch:"field_am"`
	FieldAN      string    `ch:"field_an"`
	FieldAO      uint8     `ch:"field_ao"`
	FieldAP      string    `ch:"field_ap"`
	FieldAQ      []net.IP  `ch:"field_aq"`
	FieldAR      uint64    `ch:"field_ar"`
	FieldAS      string    `ch:"field_as"`
	FieldAT      uint32    `ch:"field_at"`
	FieldAU      uint32    `ch:"field_au"`
	FieldAV      string    `ch:"field_av"`
	FieldAW      uint16    `ch:"field_aw"`
	FieldAX      uint16    `ch:"field_ax"`
	FieldAY      int8      `ch:"field_ay"`
	FieldAZ      string    `ch:"field_az"`
}

The table in the database contains the following columns:

`field_a` DateTime('UTC'),
`field_b` Date,
`field_c` IPv6,
`field_d` LowCardinality(String),
`field_e` String,
`field_f` String,
`field_g` LowCardinality(String),
`field_h` LowCardinality(String),
`field_i` UInt16,
`field_j` Int64,
`field_k` String,
`field_l` String,
`field_m` Int64,
`field_n` String,
`field_o` UInt32,
`field_p` LowCardinality(String),
`field_q` Array(UInt32),
`field_r` Array(Int64),
`field_s` String,
`field_t` Array(UInt16),
`field_u` Array(UInt32),
`field_v` Array(UInt32),
`field_w` Int32,
`field_x` Int32,
`field_y` LowCardinality(String),
`field_z` IPv6,
`field_aa` String,
`field_ab` LowCardinality(String),
`field_ac` LowCardinality(String),
`field_ad` Nullable(UInt32),
`field_ae` LowCardinality(String),
`field_af` LowCardinality(String),
`field_ag` String,
`field_ah` LowCardinality(String),
`field_ai` LowCardinality(String),
`field_aj` LowCardinality(String),
`field_ak` LowCardinality(String),
`field_al` LowCardinality(String),
`field_am` LowCardinality(String),
`field_an` LowCardinality(String),
`field_ao` UInt8,
`field_ap` LowCardinality(String),
`field_aq` Array(IPv6),
`field_ar` UInt64,
`field_as` LowCardinality(String),
`field_at` UInt32,
`field_au` UInt32,
`field_av` LowCardinality(String),
`field_aw` Nullable(UInt16),
`field_ax` Nullable(UInt32),
`field_ay` Int8,
`field_az` LowCardinality(String)

vkazmirchuk avatar Aug 22 '24 07:08 vkazmirchuk

@vkazmirchuk what is memory usage you notice? Do you have pprof memory report? I strongly recommend to create another issue. I want this one to be strictly focused on Decimal type support as originally mentioned by issue creator.

jkaflik avatar Aug 22 '24 07:08 jkaflik

Thanks! I created issue https://github.com/ClickHouse/clickhouse-go/issues/1384

vkazmirchuk avatar Aug 23 '24 06:08 vkazmirchuk