ecto_ch icon indicating copy to clipboard operation
ecto_ch copied to clipboard

Possible timestamp issue

Open Zarathustra2 opened this issue 2 years ago • 22 comments
trafficstars

Given the following test for timestamps_test.exs

  test "filtering datetimes" do
    # iso8601 type
    time = ~U[2023-04-20 17:00:25Z]

    account =
      %Account{}
      |> Account.changeset(%{name: "Test"})
      |> TestRepo.insert!()

    %Product{}
    |> Product.changeset(%{
      account_id: account.id,
      name: "Qux1",
      approved_at: time
    })
    |> TestRepo.insert!()

    from(
      u in Product,
      where: u.approved_at == ^time
    )
    |> TestRepo.exists?()
    |> IO.inspect()

    from(
      u in Product,
      where: fragment("? = ?", u.approved_at, ^time)
    )
    |> TestRepo.exists?()
    |> IO.inspect()

    from(
      u in Product,
      where: fragment("? = ?", u.approved_at, ^DateTime.to_unix(time, :millisecond))
    )
    |> TestRepo.exists?()
    |> IO.inspect()
  end

All 3 cases will print false for me.

Is this expected and if so why?

Zarathustra2 avatar Apr 20 '23 17:04 Zarathustra2

It's probably your non-UTC ClickHouse server timezone issue. It happens due to timestamps in params being encoded as text and in rows -- as ints. I think I have "fixed" it in Ch a few days ago: https://github.com/plausible/ch#select-rows

But even with that change you'd still encounter issues with naive timestamp unless you switch to a UTC server.

ruslandoga avatar Apr 21 '23 02:04 ruslandoga

But in my test I am not using a native timestamp I am using a timestamp with an UTC offset. So if I insert that timestamp and reuse to filter for that row I should get that inserted row since ~U[2023-04-20 17:00:25Z] has a timezone.

So I kinda expect the tests to pass so I assume there is still something going wrong or am I misunderstanding something?

Zarathustra2 avatar Apr 21 '23 10:04 Zarathustra2

For ~U[..] it should already be fixed in Ch master: https://github.com/plausible/ch#select-rows

ruslandoga avatar Apr 21 '23 11:04 ruslandoga

I updated my local chto to latest ch (commit d305fa092515a106a20f517e642228f75bbe2001) and now the test code above fails.

    from(
      u in Product,
      where: u.approved_at == ^time
    )
    |> TestRepo.exists?()
    |> IO.inspect()

^ This still yields false

    from(
      u in Product,
      where: fragment("? = ?", u.approved_at, ^time)
    )
    |> TestRepo.exists?()
    |> IO.inspect()

^ This now throws an error:

  1) test filtering datetimes (Ecto.Integration.TimestampsTest)
     stacktrace:
       (ch 0.1.0) lib/ch/query.ex:197: DBConnection.Query.Ch.Query.encode_param/1
       (ch 0.1.0) lib/ch/query.ex:182: anonymous fn/1 in DBConnection.Query.Ch.Query.query_params/1
       (elixir 1.14.3) lib/enum.ex:1658: Enum."-map/2-lists^map/1-0-"/2
       (ch 0.1.0) lib/ch/query.ex:100: DBConnection.Query.Ch.Query.encode/3
       (db_connection 2.5.0) lib/db_connection.ex:1336: DBConnection.maybe_encode/4
       (db_connection 2.5.0) lib/db_connection.ex:707: DBConnection.execute/4
       (ch 0.1.0) lib/ch.ex:51: Ch.query/4
       (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:438: Ecto.Adapters.SQL.query!/4
       (chto 0.1.0) lib/ecto/adapters/clickhouse.ex:235: Ecto.Adapters.ClickHouse.execute/5
       (ecto 3.10.1) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
       (ecto 3.10.1) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
       (ecto 3.10.1) lib/ecto/repo/queryable.ex:134: Ecto.Repo.Queryable.exists?/3
       test/ecto/integration/timestamps_test.exs:71: (test)

Zarathustra2 avatar Apr 21 '23 11:04 Zarathustra2

I see. Checking

ruslandoga avatar Apr 21 '23 11:04 ruslandoga

~~@Zarathustra2 I've added the test and it seems to be passing for me (UTC timezone).~~ I misunderstood 🙈

  test "filtering datetimes" do
    # iso8601 type
    time = ~U[2023-04-20 17:00:25Z]

    account =
      %Account{}
      |> Account.changeset(%{name: "Test"})
      |> TestRepo.insert!()

    %Product{}
    |> Product.changeset(%{
      account_id: account.id,
      name: "Qux1",
      approved_at: time
    })
    |> TestRepo.insert!()

    assert TestRepo.exists?(
             from u in Product,
               where: u.approved_at == ^time
           )

    assert TestRepo.exists?(
             from u in Product,
               where: fragment("? = ?", u.approved_at, ^time)
           )

    # `refute` because `DateTime` is stored as seconds, for milliseconds precision use `DateTime64(3)`
    refute TestRepo.exists?(
             from u in Product,
               where: fragment("? = ?", u.approved_at, ^DateTime.to_unix(time, :millisecond))
           )
  end

ruslandoga avatar Apr 21 '23 12:04 ruslandoga

Hmmm, if you run select timezone(); what do you get?

I get Europe/Berlin

Zarathustra2 avatar Apr 21 '23 12:04 Zarathustra2

UTC

ruslandoga avatar Apr 21 '23 12:04 ruslandoga

What if you change ur database to Europe/Berlin or anything that is not on UTC timezone. I think then the test will fail

Zarathustra2 avatar Apr 21 '23 12:04 Zarathustra2

at least that is the only difference I can now spot between your env and my as why the test fails

Zarathustra2 avatar Apr 21 '23 12:04 Zarathustra2

It seems to be more complicated than I thought. Simply encoding UTC datetimes to unix timestamps in query strings is not enough.

ruslandoga avatar Apr 21 '23 12:04 ruslandoga

https://github.com/plausible/chto/pull/46 still fails on Europe/Berlin timezone.

ruslandoga avatar Apr 21 '23 13:04 ruslandoga

Seems like the timestamp inserted in RowBinary is also shifted by ClickHouse.

ruslandoga avatar Apr 21 '23 13:04 ruslandoga

Simply encoding UTC datetimes to unix timestamps in query strings is not enough.

That is kinda weird because technically according to their documentation that should do it

(Dealing with time & timezones must be the most annoying domain problem I encounter :D )

Zarathustra2 avatar Apr 21 '23 13:04 Zarathustra2

Should the CI maybe also run with a different clickhouse timezone setting so that in the future we can catch future timezone issues?

Zarathustra2 avatar Apr 21 '23 13:04 Zarathustra2

For reference, this is what Ch is doing now: https://github.com/plausible/ch/commit/cea3726a83800269158c8f4e5f4896e90c2d90d9

ruslandoga avatar Apr 21 '23 13:04 ruslandoga

Should the CI maybe also run with a different clickhouse timezone setting so that in the future we can catch future timezone issues?

Sure, but that'd would require reworking the tests a bit since naive datetimes would still fail. Probably a tag for each naive test like @tag :naive and ExUnit.start(except: [:naive]) in test_helper.exs based on some env var would be enough. But adding that right now is low priority since Plausible is all UTC :)

ruslandoga avatar Apr 21 '23 13:04 ruslandoga

I'll return to this issue after finishing https://github.com/plausible/chto/pull/43

ruslandoga avatar Apr 21 '23 13:04 ruslandoga

Thanks my friend. I am currently slammed with work (postgres -> clickhouse migration :D ) but looking to do https://github.com/plausible/chto/issues/37 over the weekend.

Zarathustra2 avatar Apr 21 '23 13:04 Zarathustra2

I have taken another look, look at the queries:


false
16:18:06.013 [debug] QUERY OK source="products" db=2.9ms idle=191.2ms
SELECT true FROM "products" AS p0 WHERE (p0."approved_at" = {$0:DateTime}) LIMIT 1 [~N[2023-04-20 17:00:25]]

true
16:18:06.019 [debug] QUERY OK source="products" db=2.8ms idle=195.7ms
SELECT true FROM "products" AS p0 WHERE (p0."approved_at" = {$0:DateTime}) LIMIT 1 [~U[2023-04-20 17:00:25Z]]

for some reason the DateTime gets casted to a NaiveDateTime. I ahve not been successful yet to locate where this happens and why but this is the issue. If it would use not cast it it would behave the same as the fragment query

Zarathustra2 avatar Apr 25 '23 14:04 Zarathustra2

Nice find, Ecto schemas use naive datetimes by default.

ruslandoga avatar Apr 25 '23 14:04 ruslandoga

@Zarathustra2 I've added Europe/Berlin to CI in https://github.com/plausible/ecto_ch/pull/46 and adapted tests to pass. I've also added a modified version of your test to highlight the difference in "product" vs Product Ecto Query sources. The former doesn't cast ~U[] to ~N[] and the latter does. That affects how these datetimes are encoded in SELECT query params.

ruslandoga avatar Nov 11 '23 11:11 ruslandoga