ecto icon indicating copy to clipboard operation
ecto copied to clipboard

Zero Time ~T[00:00:00] saving as -838:59:59 for schema fields with :time causing exception on retrieval.

Open wnoonan opened this issue 2 years ago • 5 comments

Elixir version

1.13.4

Database and Version

MySQL 5.7.38

Ecto Versions

3.8.4

Database Adapter and Versions (postgrex, myxql, etc)

MyXQL 0.6.2

Current behavior

Disclaimer: code snippets are not exact and i do not know if they will compile, I can't paste my actual code in here, but my code does compile and is similar.

On Elixir 1.13.4 and Erlang/OTP 24 [erts-12.3.2.2] [source] [64-bit] [smp:16:2] [ds:16:2:10]

MySQL specifics: strict mode is turned off both globally and session level, collation is latin1_swedish_ci and charset is latin1

Given an Ecto schema, below is just a simple example, having at least one field with :time :

defmodule Thing do
  use Ecto.Schema
  import Ecto.Changeset

  schema "thing" do
    field :some_time, :time
  end

  def changeset(thing, params \\ %{}) do
    thing
    |> cast(params, [:some_time])
    |> validate_required([:some_time])
  end
end

When I create a change set it demonstrates the change and casts the type properly. When I save it, it also returns an :ok response, indicating that it saved successfully:

Thing.changeset(%Thing{}, %{some_time: ~T[00:00:00]}) |> Thing.insert!() // or update, or Repo.update/insert.

Changeset output:

#Ecto.Changeset<                                                                                                                                              
  action: nil,                                                                                                                                                
  changes: %{some_time: ~T[00:00:00]},                                                                                                                         
  errors: [],                                                                                                                                                 
  data: #X.Thing<>,                                                                                                                         
  valid?: true                                                                                                                                                
> 

Once I do this and check the database, the value is stored as -838:59:59

Where everything seems to go wrong is when i Thing.get(id), I receive this exception:

** (ArgumentError) cannot decode "-34d 22:59:59" as time, negative or >= 24:00:00 values are not supported                                                    
    (myxql 0.6.2) lib/myxql/protocol/values.ex:518: MyXQL.Protocol.Values.time/6
    (myxql 0.6.2) lib/myxql/protocol/values.ex:488: MyXQL.Protocol.Values.decode_time/4
    (myxql 0.6.2) lib/myxql/protocol.ex:570: MyXQL.Protocol.decode_resultset/4                                                                                
    (myxql 0.6.2) lib/myxql/client.ex:227: MyXQL.Client.recv_packets/7                                                                                        
    (myxql 0.6.2) lib/myxql/connection.ex:101: MyXQL.Connection.handle_execute/4
    (db_connection 2.4.2) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.4.2) lib/db_connection.ex:1364: DBConnection.run_execute/5                                                                               
    (db_connection 2.4.2) lib/db_connection.ex:1459: DBConnection.run/6       

Here is the real kicker. On the exact same MySQL database, in the exact same container with the exact same versions of everything, the only difference being (and this may be a hugely important difference, I am not sure) locally I build the container on a macbook (arm64/m1) whereas the container running in our GCP Kube cluster experiencing this issue is built within a github action. There is otherwise no discernible difference that i can find.

I went down some bit syntax rabbit holes and also decided to check the return values of these just in case you may need it:

System.compiled_endianness() which returns :little in both local and where i'm having the issue.

I can alternative save it properly with a sql statement:

Ecto.Adapters.SQL.query!(X.Repo, "update thing set some_time = '00:00:00' where id = ?", [1])

I don't know exactly how to get this bug to manifest for you, I am however reasonably certain it may have something to do with cpu architecture and the erlang vm. I'm mostly guessing because there is some bits and binary things going on in the MyXQL adapter.

I have ruled out the database being the issue as it is a constant between local and my staging environment and i can save this value successfully to the same database locally. Also if there is any additional information I can get to you let me know.

This may also need to move to the MyXQL repo but, i'm not certain where the actual issue is happening.

If anyone could help me resolve this I would greatly appreciate it.

Thanks.

Expected behavior

~T[00:00:00] should be stored as 00:00:00 not -838:59:59

wnoonan avatar Nov 11 '22 02:11 wnoonan

If this is helpful at all, this is the docker image I am using: FROM hexpm/elixir:1.13.4-erlang-24.3.4.2-debian-stretch-20210902

It is built inside of a Github action before being deployed to our cloud cluster.

When testing this locally I build the container locally with docker compose. I point it to the same MySQL database in the cloud though.

wnoonan avatar Nov 11 '22 02:11 wnoonan

If you’d be able to reproduce it with a script like this one it would help a lot: https://github.com/wojtekmach/mix_install_examples/blob/main/ecto_sql.exs

wojtekmach avatar Nov 11 '22 03:11 wojtekmach

In addition to what wojtekmach said, did you rule out the following:

  1. Your app is modifying the time before inserting it. The time you're receiving is exactly the minimum time allowed by MySQL. Is there something going "mininum time + x" where, x = 0 in your case. The reason I'm suspecting this is because MySQL accepts time 00:00:00 in a unique way so it's really odd for it to be saved like this. Or is there maybe a trigger modifying the value in your prod database but not your dev database?
  2. The data was inserted outside of Ecto and when you use Ecto it's not actually changing anything.

greg-rychlewski avatar Nov 11 '22 04:11 greg-rychlewski

If you’d be able to reproduce it with a script like this one it would help a lot: https://github.com/wojtekmach/mix_install_examples/blob/main/ecto_sql.exs

I can see if I can do this and get back with you. Thank you for such a quick response!

In addition to what wojtekmach said, did you rule out the following:

  1. Your app is modifying the time before inserting it. The time you're receiving is exactly the minimum time allowed by MySQL. Is there something going "mininum time + x" where, x = 0 in your case. The reason I'm suspecting this is because MySQL accepts time 00:00:00 in a unique way so it's really odd for it to be saved like this. Or is there maybe a trigger modifying the value in your prod database but not your dev database?

Hey thanks for responding!

  1. I don't believe the application is modifying the time, when i pass log: true to the insert/update Repo funcs I get a result like this:
[debug] QUERY OK db=5.7ms queue=0.3ms idle=1939.9ms                                                                                              
UPDATE `thing` SET `end_time` = ? WHERE `id` = ? [~T[00:00:00], 1] 

This database does not have any triggers, and just to be clear this is happening with the same MySQL database instance locally and in our staging environment (i'm pointing my local running instance to the same cloud MySQL DB instance that the application running in the cloud is using), the only difference is where the application is running.

  1. The data was inserted outside of Ecto and when you use Ecto it's not actually changing anything.

I'm not sure what you mean? I insert it with the code snippet above in my original issue and it successfully inserts, it just happens to be -838:59:59 after viewing it in the database.

wnoonan avatar Nov 11 '22 15:11 wnoonan

Very strange..

BTW maybe turning on strict mode will cause an error to happen when inserting and might give more info.

greg-rychlewski avatar Nov 11 '22 16:11 greg-rychlewski

Hi @wnoonan, did you have any luck reproducing it? :)

josevalim avatar Dec 20 '22 19:12 josevalim

Hey @josevalim @wojtekmach and @wnoonan, during my journey to help to contribute to Elixir repos I built this gist with a setup to reproduce the error using the same version of MySQL, Elixir, Erlang OTP and libs, but cannot find anything as well, so maybe this issue should be closed?

marciol avatar Feb 04 '23 04:02 marciol

Let’s close this for now since we can’t reproduce it but as soon as we have more details and ideally a Mix.install script, we’ll jump right back in. Thanks!

wojtekmach avatar Feb 04 '23 06:02 wojtekmach

Sharing an update, I've forked this repository and removed these lines in the myxql adapter. In my project I reference this version now:

Screen Shot 2023-02-24 at 16 47 16

Removing this fixes the issue, however I still don't understand why it fixes the issue. I'm leaving this here in case anyone may have a better understanding of the low level bit/binary things going on and how they may relate to the system architectures that these containers are running on.

Thanks.

wnoonan avatar Feb 24 '23 22:02 wnoonan

The link isn't working anymore, so here is one from the way back machine: https://web.archive.org/web/20210731185906/https://dev.mysql.co.m/doc/internals/en/binary-protocol-value.html.

In short, the <<0>> is a compressed packet, by removing the lines then, you avoid sending a compressed packet and send it in its entirety.

Avoiding this optimization would ensure that versions that don't handle it correctly will work with the adapter.

I believe a PR is welcome, cc @wojtekmach

Schultzer avatar Feb 25 '23 16:02 Schultzer

The link isn't working anymore, so here is one from the way back machine: https://web.archive.org/web/20210731185906/https://dev.mysql.co.m/doc/internals/en/binary-protocol-value.html.

In short, the <<0>> is a compressed packet, by removing the lines then, you avoid sending a compressed packet and send it in its entirety.

Avoiding this optimization would ensure that versions that don't handle it correctly will work with the adapter.

I believe a PR is welcome, cc @wojtekmach

@Schultzer

When I remove those lines ~T[00:00:00] will pattern match on the next method:

Screen Shot 2023-02-27 at 09 36 50

To be clear I don't think this is a problem specifically with the myxql adapter as I can save and retrieve ~T[00:00:00] against the same database with a locally (arm64/m1) built container with the exact same versions of everything. Because of that I don't think a pull request is necessary.

My intention was to shed light on what makes it work for me in the hopes that someone with a firm understanding of what is happening at the adapter/sql level may be able to point me to what system level oddity could be causing these symptoms.

Thank you for the response though I appreciate it!

wnoonan avatar Feb 27 '23 15:02 wnoonan

The link isn't working anymore, so here is one from the way back machine: https://web.archive.org/web/20210731185906/https://dev.mysql.co.m/doc/internals/en/binary-protocol-value.html. In short, the <<0>> is a compressed packet, by removing the lines then, you avoid sending a compressed packet and send it in its entirety. Avoiding this optimization would ensure that versions that don't handle it correctly will work with the adapter. I believe a PR is welcome, cc @wojtekmach

@Schultzer

When I remove those lines ~T[00:00:00] will pattern match on the next method:

Screen Shot 2023-02-27 at 09 36 50

To be clear I don't think this is a problem specifically with the myxql adapter as I can save and retrieve ~T[00:00:00] against the same database with a locally (arm64/m1) built container with the exact same versions of everything. Because of that I don't think a pull request is necessary.

My intention was to shed light on what makes it work for me in the hopes that someone with a firm understanding of what is happening at the adapter/sql level may be able to point me to what system level oddity could be causing these symptoms.

Thank you for the response though I appreciate it!

Thanks, if you take a look at the examples in https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_binary_resultset.html, you will see that the example for the compressed packet for ~T[00:00:00] is a hexadecimal encoded packet that you can decode in Elixir

01 -- time 0d 00:00:00

iex> Base.decode16!("01")
<<1>>

As you see above, the result differs from what encode_binary_time/1 returns: <<0>>.

Taking this further, we can add tests for the examples: https://github.com/Schultzer/myxql/pull/1/commits/eb4e4846aaf91909f8ed32e09ecfb21ff8bd4b19#diff-9db240caa482271c2eedcb2153f7b43f2189bf56ab506fe711a30ed5d5c935bdR112-R124.

And we can now see that they pass for everything other than MySQL 8, which errors with a malformed packet.

Premature optimization is the root of all evil

As famously said by Donald Knuth.

We can avoid this optimization by removing the function and forcing ~T[00:00:00] not to be compressed into a packet that might be interpreted differently in different versions and systems at the small cost of 8 extra bytes.

Schultzer avatar Mar 04 '23 19:03 Schultzer

@Schultzer Thanks for digging into it. If you feel confident you found a bug and can reproduce it with a failing test and have a fix that works across versions, would you be able to submit a PR to MyXQL? What you are saying sound reasonable to me but it's been a while since I touched that project. You will get more eyes with a PR.

greg-rychlewski avatar Mar 04 '23 19:03 greg-rychlewski