clickhouse-go Allow to reuse batch if append failed

Describe the bug

Steps to reproduce

Prepare valid and bad data in any order
Prepare batch
Call AppendStruct in loop
See logs

Expected behaviour

Currently it's not possible to understand what row is corrupted, even if 1 row of 10000 will have some invalid value, it affects all next data in the current batch.

There are 2 possible ways to give some flexibility:

when AppendStruct detects any problem with current struct data it returns an error and doesn't append corrupted values to batch, developer must decide what to do with that error on his own, but it should be possible to continue and skip that rows
when AppendStruct detects any problem with current struct data it returns an error, appends row like now, but any next calls of AppendStruct with valid data will be succeed

Code example

func AppendStructWithBadData() error {
	conn, err := GetNativeConnection(nil, nil, nil)
	if err != nil {
		return err
	}
	ctx := context.Background()
	defer func() {
		conn.Exec(ctx, "DROP TABLE example")
	}()
	if err := conn.Exec(ctx, `DROP TABLE IF EXISTS example`); err != nil {
		return err
	}
	if err := conn.Exec(ctx, `
		CREATE TABLE example (
			  Col1 String
			, Col2 DateTime
		) Engine = Memory
		`); err != nil {
		return err
	}

	batch, err := conn.PrepareBatch(context.Background(), "INSERT INTO example")
	if err != nil {
		return err
	}

	data := []struct {
		Col1 string
		Col2 time.Time
	}{
		{
			Col1: "valid data", // no error
			Col2: time.Now(),
		},
		{
			Col1: "bad data", // error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59
			Col2: time.Time{},
		},
		{
			Col1: "valid data", // error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59: clickhouse: batch is invalid. check appended data is correct
			Col2: time.Now(),
		},
	}

	for i, r := range data {
		err := batch.AppendStruct(&r)
		if err != nil {
			fmt.Printf("AppendStruct failed: index=%d, error=%+v\n", i, err.Error())
		} else {
			fmt.Printf("AppendStruct succed: index=%d\n", i)
		}
	}

	fmt.Printf("send batch: rows=%d\n", batch.Rows())

	return batch.Send()
}

Error log

AppendStruct succed: index=0
AppendStruct failed: index=1, error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59
AppendStruct failed: index=2, error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59: clickhouse: batch is invalid. check appended data is correct

send batch: rows=2

        	Error:      	Received unexpected error:
        	            	clickhouse: batch is invalid. check appended data is correct
        	            	clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59

Configuration

Environment

Client version:
Language version:
OS:
Interface: ClickHouse API / database/sql compatible driver

ClickHouse server

ClickHouse Server version:
ClickHouse Server non-default settings, if any:
CREATE TABLE statements for tables involved:
Sample data for all these tables, use clickhouse-obfuscator if necessary

Mar 06 '24 14:03 JILeXanDR

Hi @JILeXanDR

This is the current behavior of batch insertion. On the very first error for data append, the connection is released, and all next append calls will return the previous error.

This error is intentionally wrapped with https://github.com/ClickHouse/clickhouse-go/blob/51cea28b90940b3887266de20b28df6b0e4512ea/clickhouse.go#L46

I agree it might be counter-intuitive, but also, I don't see a good solution here. There might be client-side data validation, where we can recover (like in this case), however we cannot recover for errors sent from ClickHouse.

Mar 06 '24 16:03 jkaflik

however we cannot recover for errors sent from ClickHouse.

But any error from Append is local rather than from Clickhouse right?

Mar 07 '24 18:03 yujiarista

@yujiarista @JILeXanDR at the end of day, I agree this should not break batch. This requires enhancement.

Mar 25 '24 14:03 jkaflik

@yujiarista @JILeXanDR I had a look on this today and found a discussion (I forgot about it 🤦 ) we already had in the past on this: https://github.com/ClickHouse/clickhouse-go/issues/655

tl;dr given columnar append, it's not trivial to guarantee reusable batch without data corruption. I still agree it needs to be enhanced, but not sooner than in v3.

Mar 27 '24 09:03 jkaflik

clickhouse-go clickhouse-go copied to clipboard

Allow to reuse batch if append failed

Describe the bug

Steps to reproduce

Expected behaviour

Code example

Error log

Configuration

Environment

ClickHouse server

clickhouse-go
clickhouse-go copied to clipboard