embulk-input-jdbc icon indicating copy to clipboard operation
embulk-input-jdbc copied to clipboard

[MySQL] last_record with timestamp column stored wrong timestamp.

Open hiroyuki-sato opened this issue 8 years ago • 3 comments

  • Embulk: 0.8.39
  • embulk-input-mysql: 0.8.2
  • MySQL: 5.7.19
  • Server and Client timezone: JST.
  • sample_data: sample.txt

Those data is JST timezone.

mysql> select * from in_mysql_test;
+------+---------------------+-------+
| c0   | c1                  | c2    |
+------+---------------------+-------+
|    1 | 2017-12-01 00:00:00 | dummy |
|    2 | 2017-12-01 00:01:00 | dummy |
...
| 1440 | 2017-12-01 23:59:00 | dummy |
| 1441 | 2017-12-02 00:00:00 | dummy |
+------+---------------------+-------+

in:
  type: mysql
  host: 127.0.0.1
  user: {{ env.mysql_user }}
  password: {{ env.mysql_password }}
  table: in_mysql_test
  database: embulk_test
  options: { useLegacyDatetimeCode: false }
  incremental_columns:
  - c1
  incremental: true

out:
  type: stdout

Execute embulk run -o diff.yml input_mysql.yml.liquid

in:
  last_record: ['2017-12-01T06:00:00.000000Z']
out: {}
1439,2017-12-01 14:58:00,dummy
1440,2017-12-01 14:59:00,dummy
1441,2017-12-01 15:00:00,dummy
2017-12-07 17:31:58.364 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2017-12-07 17:31:58.370 +0900 [INFO] (main): Committed.
2017-12-07 17:31:58.371 +0900 [INFO] (main): Next config diff: {"in":{"last_record":["2017-12-01T06:00:00.000000Z"]},"out":{}}

The last_record should be 2017-12-01T15:00:00.000000Z but It set 2017-12-01T06:00:00.000000Z

And second execution log

2017-12-07 20:05:33.310 +0900 [INFO] (0018:task-0000): SQL: SELECT * FROM `in_mysql_test` WHERE ((`c1` > ?)) ORDER BY `c1`
2017-12-07 20:05:33.311 +0900 [INFO] (0018:task-0000): Parameters: ["2017-12-01T06:00:00.000000Z"]

Reference

hiroyuki-sato avatar Dec 07 '17 08:12 hiroyuki-sato

Hello, @ganchiku

Thank you for your comment, Does this mean the following change?

It seems to work well in my environment. @hito4t What do you think this change?

diff --git a/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java b/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java
index f128ee1..9033c8f 100644
--- a/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java
+++ b/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java
@@ -22,8 +22,8 @@ public class MySQLTimestampTimestampIncrementalHandler
     @Override
     public org.embulk.spi.time.Timestamp utcTimestampFromSessionTime(long epochSecond, int nano)
     {
-        long sec = sessionTimeZone.convertLocalToUTC(epochSecond * 1000, false) / 1000;
-        return org.embulk.spi.time.Timestamp.ofEpochSecond(sec, nano);
+//        long sec = sessionTimeZone.convertLocalToUTC(epochSecond * 1000, false) / 1000;
+        return org.embulk.spi.time.Timestamp.ofEpochSecond(epochSecond, nano);
     }

     @Override

@hito4t

After applying above change, The last_record result is 2017-12-01T15:00:00.000000Z. It expects value.

I want to verify one thing. Does this timestamp parameter change to local timezone before query? (Change 2017-12-01T15:00:00.000000Z -> 2017-12-02T00:00:00+0900)

2017-12-08 18:48:06.486 +0900 [INFO] (0018:task-0000): SQL: SELECT * FROM `in_mysql_test` WHERE ((`c1` > ?)) ORDER BY `c1`
2017-12-08 18:48:06.487 +0900 [INFO] (0018:task-0000): Parameters: ["2017-12-01T15:00:00.000000Z"]

Because when I execute the following SQL, It returns the wrong result. But this plugin returns no data.(correct)

SELECT * FROM `in_mysql_test` WHERE ((`c1` > '2017-12-01T15:00:00.000000Z' )) ORDER BY `c1`;
+------+---------------------+-------+
| c0   | c1                  | c2    |
+------+---------------------+-------+
|  902 | 2017-12-01 15:01:00 | dummy |
|  903 | 2017-12-01 15:02:00 | dummy |
|  904 | 2017-12-01 15:03:00 | dummy |
|  905 | 2017-12-01 15:04:00 | dummy |

hiroyuki-sato avatar Dec 08 '17 10:12 hiroyuki-sato

@hiroyuki-sato

Thank you for your investigation, and I'm sorry that I have deleted my previous comment on this issue. I am still skeptical that my configuration might be bad.

But it was follows:

Thank you for raising this issue. I'm new to Embulk, and now trying to research how to use this software.

I looked into the embulk-input-mysql, and found something strange, and would like to share it.

https://github.com/embulk/embulk-input-jdbc/blob/master/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java#L23

In `utcTimestampFromSessionTime` method, if the MySQL time_zone is JST, both `sessionTimeZone.convertLocalToUTC` and `org.embulk.spi.time.Timestamp.ofEpochSecond`  do UTC calculation.

If I removed `sessionTimeZone.convertLocalToUTC` from the method, it works, but I don't know if it is right fix for this problem. I also tried if MySQL time_zone is UTC, and it also works.

I can make PR for the fix, or if you have better idea, please provide how.

Thanks,

I will look into this issue more.

Thank you.

ganchiku avatar Dec 11 '17 02:12 ganchiku

@ganchiku @hiroyuki-sato Thank you for the information!

As you wrote, we shoudn't convert timestamp in utcTimestampFromSessionTime method because the timestamp returned from the server will be already valid.

If you make PR, I'll merge.

hito4t avatar Dec 14 '17 06:12 hito4t