clickhouse-go icon indicating copy to clipboard operation
clickhouse-go copied to clipboard

AsyncInsert does not support Tuple types in either sliced or map styled parameters

Open g-ananth opened this issue 1 year ago • 3 comments

Bug

I have a column that is of type tuple. Batch Append works fine with that column but AsyncInsert throws an Exception.

The insert statement I am using is of the following format.

Insert into db.table (colname1, colname2,..) VALUES ( ?, ? , ? ) 

The exception I get is : When using Sliced parameters , I get the following exception.

Code: 53. DB::Exception: Type mismatch in IN or VALUES section. 
Expected: Tuple(A Array(String), B Array(UUID), C Array(Array(String)), 
D Array(Array(Int64)), E Array(String), F Array(UInt32), 
G Array(Array(UInt32)), H Array(DateTime64(3, 'UTC')), 
I Array(Array(DateTime64(3, 'UTC')))). 

Got: Array: While executing WaitForAsyncInsert. (TYPE_MISMATCH), 

I get a similar exception when using a map (as the tuple is a named tuple). The exception says that there is a Map instead of a Tuple just like the Array exception above.

Steps to reproduce

The following unit test might suffice to reproduce.


func TestNamedTupleWithMapAsync(t *testing.T) {
	conn, err := GetNativeConnection(nil, nil, nil)
	ctx := context.Background()
	require.NoError(t, err)
	// https://github.com/ClickHouse/ClickHouse/pull/36544
	if !CheckMinServerServerVersion(conn, 22, 5, 0) {
		t.Skip(fmt.Errorf("unsupported clickhouse version"))
		return
	}
	
	const ddl = "CREATE TABLE test_tuple (Col1 Tuple(A Array(String), B Array(UUID), C Array(Array(String)), D Array(Array(Int64)), E Array(String), F Array(UInt32), G Array(Array(UInt32)), H Array(DateTime64(3, 'UTC')), I Array(Array(DateTime64(3, 'UTC'))))) Engine MergeTree() ORDER BY tuple()"
	defer func() {
		conn.Exec(ctx, "DROP TABLE IF EXISTS test_tuple")
	}()
	require.NoError(t, conn.Exec(ctx, ddl))
	require.NoError(t, err)
	emptyTuple := make(map[string]any)
	emptyTuple["A"] = make([]string, 0)
	emptyTuple["B"] = make([]string, 0)
	emptyTuple["C"] = make([][]string, 0)
	emptyTuple["D"] = make([][]int64, 0)
	emptyTuple["E"] = make([]string, 0)
	emptyTuple["F"] = make([]uint32, 0)
	emptyTuple["G"] = make([][]uint32, 0)
	emptyTuple["H"] = make([]string, 0)
	emptyTuple["I"] = make([][]string, 0)
	require.NoError(t, conn.AsyncInsert(ctx, "INSERT INTO test_tuple (Col1) VALUES (?) ", true, emptyTuple))
	var (
		col1 map[string]any
	)
	require.NoError(t, conn.QueryRow(ctx, "SELECT * FROM test_tuple").Scan(&col1))
	// slicedEmtpyTuples := []any{[]string{}, []string{}, [][]string{}, [][]int64{}, []string{}, []uint32{}, [][]uint32{}, []string{}, [][]string{}} - Sliced Emtpy Tuple also does not work either 
	assert.Equal(t, emptyTuple, col1)
}

Expected behaviour

No exception.

Configuration

The client connection is created with the following additional settings in the client application:

{
"async_insert":                   1,
"wait_for_async_insert":          1,
}

Environment

  • Client version: 2.15.0 (and 2.14.x as well)
  • Language version: go 1.21.3
  • OS: Osx (Unit test run in Docker)
  • Interface: ClickHouse API

ClickHouse server

  • ClickHouse Server version: 23.7.1 revision 54464
  • ClickHouse Server non-default settings, if any:

g-ananth avatar Nov 14 '23 20:11 g-ananth

AsyncInsert does not use standard batch statement inserts using native format. Instead, it relies on query with an inline data.

Since ClickHouse 23.10 you should be able to use batch API with async_insert setting.

jkaflik avatar Nov 15 '23 16:11 jkaflik

@jkaflik does that mean I will be able to use batch insert and set some flag to make it so its actually an async insert? I am using the clickhouse-go lib inside a high frequency applicaiton and using it to stream lots of data to clickhouse, and hit error preparing new batch: error="clickhouse: acquire conn timeout. you can increase the number of max open conn or the dial timeout" error with batch inserts and default connection options.

Wondering if it makes sense here to switch completely to async inserts (about 2k-3k rows per second or more) but run them as anonymous goroutines so the performance of the rest of my binary isn't affected.

AdvaithD avatar Nov 16 '23 21:11 AdvaithD

@AdvaithD async insert is nothing more than pushing batching into ClickHouse. I'd still recommend you to batch on a client-side on a reasonable size - depend on your data, it can be few k rows or even hundreds of k.

Error message:

you can increase the number of max open conn

is explicit on what happens. Your connection pool hit the limit and you cannot get more until limit of max open connections is increased. (client-side) IMO doing batch inserts is better with a limited number of connections. Consider having one/two threads doing only batch ingestion. It depends on your app architecture.

jkaflik avatar Nov 17 '23 07:11 jkaflik