clickhouse-go icon indicating copy to clipboard operation
clickhouse-go copied to clipboard

Insert into ClickHouse table with DateTime64 data type using character type time, the value is 8 hours more than the original value

Open zhoufuke opened this issue 3 years ago • 9 comments

Issue description

Hello!

Describe the unexpected behaviour When the time data of timestamp(n) or datetime(n) is converted into character format using Golang , inserted into DateTime64(n) of ClickHouse, the inserted data is 8 hours longer,n>0。

ClickHouse timezone is Asia/Shanghai。 ClickHouse Version is 22.6.2.12 and 22.6.3.35,they are upgraded from 19.x。

What's causing this?

Example code

import (
"context"
"fmt"
_ "github.com/ClickHouse/clickhouse-go"
"github.com/jmoiron/sqlx"
"log"
"time"
)
............
tx, err := allclient.Begin()

if err != nil {

	log.Println("xxx", err)

}

log.Println("start Prepare")

stmt, err := tx.Prepare("INSERT INTO test(id,cretime,cretime1,cretime2,cretime3) VALUES (?,?,?,?,?)")

if err != nil {

	log.Println("xxx2", err)

}

if _, err := stmt.Exec(
	23,
	"2022-07-20 17:42:48",
	"2022-07-20 17:42:48",
	"2022-07-20 17:42:48.129",
	"2022-07-20 17:42:48.129",
); err != nil {

	log.Fatal("2", err)

}

if err := tx.Commit(); err != nil {

	log.Fatal("3", err)

}

Configuration

Driver version: <=1.5.4

Go version: 1.18 or 1.16

ClickHouse Server version: 22.6.3.35

Table Structure:

CREATE TABLE test
(
    `id` Int64,
    `cretime` Nullable(DateTime),
    `cretime1` Nullable(DateTime),
    `cretime2` Nullable(DateTime64(3)),
    `cretime3` Nullable(DateTime64(3)),
)
ENGINE = ReplacingMergeTree
ORDER BY id
SETTINGS index_granularity = 8192;

The result:

[root@root ~]# clickhouse-client --user test --password
ClickHouse client version 22.6.3.35 (official build).
Password for user (test): 
Connecting to localhost:9000 as user test.
Connected to ClickHouse server version 22.6.3 revision 54455.
root :) use test;

USE test

Query id: ba395487-6ee0-4f65-8ecf-aa01162a0f4f

Ok.

0 rows in set. Elapsed: 0.007 sec. 

root :) select * from test where id = 23;

SELECT *
FROM test
WHERE id = 23

Query id: bcb6a4f8-8f52-47ac-b1ce-4b98a99cd539

┌─id─┬─────────────cretime─┬────────────cretime1─┬────────────────cretime2─┬────────────────cretime3─┐
│ 23 │ 2022-07-20 17:42:48 │ 2022-07-20 17:42:48 │ 2022-07-21 01:42:48.129 │ 2022-07-21 01:42:48.129 │
└────┴─────────────────────┴─────────────────────┴─────────────────────────┴─────────────────────────┘

1 row in set. Elapsed: 0.012 sec. 

The now() and timezone() result:

root:) select now(),timezone();

SELECT
    now(),
    timezone()

Query id: 36ab82ce-5048-4a90-814c-22f8efaf024d

┌───────────────now()─┬─timezone()────┐
│ 2022-07-21 14:44:48 │ Asia/Shanghai │
└─────────────────────┴───────────────┘

1 row in set. Elapsed: 0.005 sec. 

The timezone in config is:

[root@root ~]# cat /etc/clickhouse-server/config.xml|grep timezone
<timezone>Asia/Shanghai</timezone>

The server timezone:

[root@root~]# timedatectl 
      Local time: Thu 2022-07-21 14:46:27 CST
  Universal time: Thu 2022-07-21 06:46:27 UTC
        RTC time: Thu 2022-07-21 06:46:27
       Time zone: Asia/Shanghai (CST, +0800)
     NTP enabled: no
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a

zhoufuke avatar Jul 25 '22 11:07 zhoufuke

I'll put a test together for this to illustrate but I don't think clickhouse-go is considering the Locale of the column itself if its not explicitly set in the column defn.

From https://clickhouse.com/docs/en/sql-reference/data-types/datetime

You can explicitly set a time zone for DateTime-type columns when creating a table. Example: DateTime('UTC'). If the time zone isn’t set, ClickHouse uses the value of the timezone parameter in the server settings or the operating system settings at the moment of the ClickHouse server start.

Now if you set this in the column I suspect the behaviour will be as expected. I'll look if we can possibly read the tz here.

gingerwizard avatar Jul 26 '22 12:07 gingerwizard

Ok, behaviour varies but v2 is correct I believe. v1 is broken but not as the issue suggests. This indeed seems to differ from ClickHouse-client.

The driver is parsing the string and sending as an epoch uint64. During encoding locale is not considered. The locale is either set on the server or in the column. In the following example, I have <timezone>Asia/Shanghai</timezone> on the server.

ClickHouse Version: 22.6.1.1985

Schema

CREATE TABLE test_date (
			Id Int64,
			Col1 DateTime,
			Col2 DateTime('UTC'),
			Col3 DateTime64(3),
		        Col4 DateTime64(3, 'UTC')
		) Engine Memory

v1

func Test690(t *testing.T) {
	conn, err := sql.Open("clickhouse", "clickhouse://127.0.0.1:9000")
	require.NoError(t, err)
	const ddl = `
		CREATE TABLE test_date (
			Id Int64,
			Col1 DateTime,
			Col2 DateTime('UTC'),
			Col3 DateTime64(3),
		        Col4 DateTime64(3, 'UTC')
		) Engine Memory
		`
	conn.Exec("DROP TABLE test_date")
	_, err = conn.Exec(ddl)
	require.NoError(t, err)
	scope, err := conn.Begin()
	require.NoError(t, err)
	batch, err := scope.Prepare("INSERT INTO test_date")
	require.NoError(t, err)
	_, err = batch.Exec(
		int64(23),
		"2022-07-20 17:42:48.129",
		"2022-07-20 17:42:48.129",
		"2022-07-20 17:42:48.129",
		"2022-07-20 17:42:48.129",
	)
	require.NoError(t, err)
	require.NoError(t, scope.Commit())
}

DateTime64 is parsed with the pattern 2006-01-02 15:04:05.999 and it converts the uint64 epoch independent of the timezone. DateTime uses 2006-01-02 15:04:05 but critically it assumes Local timezone i.e. the client's timezone.

My local time is -1 UTC.

Reading this data with ClickHouse client, I get

SELECT *
FROM test_date

Query id: 57129d1c-e666-455e-b49c-8dccdd730204

┌─Id─┬────────────────Col1─┬────────────────Col2─┬────────────────────Col3─┬────────────────────Col4─┐
│ 23 │ 2022-07-21 00:42:48 │ 2022-07-20 16:42:48 │ 2022-07-21 01:42:48.129 │ 2022-07-20 17:42:48.129 │
└────┴─────────────────────┴─────────────────────┴─────────────────────────┴─────────────────────────┘

Col4 - is correct. This spec says epoch internally and its rendered according to UTC - the value of 1658338968129 Col3 - It is using the server's timezone and rendering in Shanhai time. i think this is also correct. Col2 - The bug in adjusting to local has caused the date to be inserted incorrectly. Hence we loose an hr. Col1 - same as Col2 - server is adjusting but insertion lost an hr.

v2

v2 doesn't support sending a string for DateTime yet. We need to close this gap. Insert code is therefore simpler.

func Test690(t *testing.T) {
	conn, err := sql.Open("clickhouse", "clickhouse://127.0.0.1:9000")
	require.NoError(t, err)
	if err := std.CheckMinServerVersion(conn, 21, 9, 0); err != nil {
		t.Skip(err.Error())
		return
	}
	const ddl = `
		CREATE TABLE test_date (
			Id Int64,
			Col3 DateTime64(3),
		        Col4 DateTime64(3, 'UTC')
		) Engine Memory
		`
	conn.Exec("DROP TABLE test_date")
	_, err = conn.Exec(ddl)
	require.NoError(t, err)
	scope, err := conn.Begin()
	require.NoError(t, err)
	batch, err := scope.Prepare("INSERT INTO test_date")
	require.NoError(t, err)
	_, err = batch.Exec(
		int64(23),
		"2022-07-20 17:42:48.129",
		"2022-07-20 17:42:48.129",
	)
	require.NoError(t, err)
	require.NoError(t, scope.Commit())
}

Behaviour at insert time for DateTime64 is identical for v2 and v1.

From clickhouse-client,

SELECT *
FROM test_date

Query id: 82ef27b8-daf1-4542-b840-b9cd798f228e

┌─Id─┬────────────────────Col3─┬────────────────────Col4─┐
│ 23 │ 2022-07-21 01:42:48.129 │ 2022-07-20 17:42:48.129 │
└────┴─────────────────────────┴─────────────────────────┘

1 row in set. Elapsed: 0.050 sec.

Both of these are correct.

ClickHouse Client

DROP TABLE IF EXISTS test_date;
CREATE TABLE test_date
(
    Id Int64,
    Col1 DateTime,
    Col2 DateTime('UTC'),
    Col3 DateTime64(3),
    Col4 DateTime64(3, 'UTC')
)
ENGINE = Memory;

INSERT INTO test_date VALUES (23, '2022-07-20 17:42:48', '2022-07-20 17:42:48', '2022-07-20 17:42:48.129', '2022-07-20 17:42:48.129');
SELECT * FROM test_date;
clickhouse :) SELECT * FROM test_date;

SELECT *
FROM test_date

Query id: cd0b1b5f-051d-4171-b80e-16d13fca270d

┌─Id─┬────────────────Col1─┬────────────────Col2─┬────────────────────Col3─┬────────────────────Col4─┐
│ 23 │ 2022-07-20 17:42:48 │ 2022-07-20 17:42:48 │ 2022-07-20 17:42:48.129 │ 2022-07-20 17:42:48.129 │
└────┴─────────────────────┴─────────────────────┴─────────────────────────┴─────────────────────────┘

1 row in set. Elapsed: 0.021 sec.

This seems incorrect to me and I've reached out to the clickhouse-client team to understand the differences here.

Summary

  1. v1 - is take the local timezone of the client at insert time. This feels like a bug. I'm assuming @zhoufuke your client was on UTC.
  2. v2 - this is correct, I think.

gingerwizard avatar Jul 26 '22 17:07 gingerwizard

What time do you expect when specifying 2022-07-20 17:42:48 on insert?

  • Server time zone
  • Client local time zone
  • UTC

ernado avatar Jul 27 '22 12:07 ernado

What time do you expect when specifying 2022-07-20 17:42:48 on insert?

  • Server time zone
  • Client local time zone
  • UTC

I want the Client local time zone.

zhoufuke avatar Jul 28 '22 02:07 zhoufuke

Thanks.

  • My client tz is UTC+8
  • I think adding UTC on the column is just a temporary fix. Also, I think the data is presented in the time zone of UTC, which is wrong data.

zhoufuke avatar Jul 29 '22 00:07 zhoufuke

So currently, 2022-07-20 17:42:48 is inserted as 1658338968129 i.e. UTC. When you read it back, it's rendered in UTC+8. We have a few options here at insert time:

  1. Use the tz of the string or time.Time{}
  2. Use the column tz
  3. Use the server timezone
  4. Assume UTC - this is what we do now.
  5. Use the local tz of the client.

Wrt to rendering - we have more options:

  1. Render according to column tz if available - we do this now.
  2. Render according to local tz - we do this if (1) isn't available.
  3. Use the server timezone

We need document and agree defaults and fall backs.

gingerwizard avatar Jul 29 '22 09:07 gingerwizard

This is further complicated by the fact column TZ info is not returned over HTTP even if using the native format (and the column has a tz) https://github.com/ClickHouse/ClickHouse/issues/38209

gingerwizard avatar Jul 29 '22 19:07 gingerwizard

Thanks. Currently, using a string column to solve the problem.

zhoufuke avatar Aug 01 '22 07:08 zhoufuke

@zhoufuke we discussed and agreed on a specification for all clients - https://github.com/ClickHouse/clickhouse-go/issues/719

In summary, we will revert to the tz of the client - consistent with your desired behaviour. Thanks for the patience and for raising the issue - which prompted our discussions.

gingerwizard avatar Aug 11 '22 11:08 gingerwizard

ok,thanks.

zhoufuke avatar Aug 22 '22 09:08 zhoufuke