rsgen-avro icon indicating copy to clipboard operation
rsgen-avro copied to clipboard

Avro 0.16 wrongly fails on date examples

Open lerouxrgd opened this issue 1 year ago • 27 comments

          Interestingly 0.16 cannot parse this schema anymore:
{
  "type": "record",
  "name": "DateLogicalType",
  "fields": [ {
    "name": "birthday",
    "type": {"type": "int", "logicalType": "date"},
    "default": 1681601653
  } ]
}

It says: default's value type of field "birthday" in "DateLogicalType" must be "{\"type\":\"int\",\"logicalType\":\"date\"}"

However this one is fine:

{
  "type": "record",
  "name": "DateLogicalType",
  "fields": [ {
    "name": "release_datetime_micro",
    "type": {"type": "long", "logicalType": "timestamp-micros"},
    "default": 1570903062000000
  } ]
}

@martin-g Do you know where I should report this ?

Originally posted by @lerouxrgd in https://github.com/lerouxrgd/rsgen-avro/issues/60#issuecomment-1884938904

lerouxrgd avatar Jan 15 '24 14:01 lerouxrgd

@lerouxrgd Indeed, it seems that the apache-avro project on GitHub does not use the issue tracking system of GitHub. After some clicking around, it seems that all issues are managed in Jira by the Apache Software Foundation ASF, which can be found here. At the top of that page, you can find a self serve sign up link. After you have an account, I hope you can submit that issue there.

I've requested an account, but it seems to take some time...

FYI I had a similar issue as the one you've described. Below schema, although fine in Kafka, could not be parsed in rsgen-avro. The culprit is the logicalType, presumably because the field can also be null.

{
    "type": "record",
    "name": "TimeManagement",
    "namespace": "eu.driver.model.sim.config",
    "doc": "The time management message can be used for informing connected applications on time progression and changes. *Copyright (C) 2019-2020 XVR Simulation B.V., Delft, The Netherlands, Martijn Hendriks <hendriks @ xvrsim.com>. This file is licensed under the MIT license : https://github.com/DRIVER-EU/avro-schemas/blob/master/LICENSE*",
    "fields": [
        {
            "name": "state",
            "type": {
                "type": "enum",
                "name": "TimeState",
                "doc": "Initialization – preparing for the actual start of the simulation time; Started – the simulation time is started; Paused – the simulation time is paused; Stopped – the simulation time is stopped; Reset – the simulation time is reset",
                "symbols": [
                    "Initialization",
                    "Started",
                    "Paused",
                    "Stopped",
                    "Reset"
                ]
            },
            "doc": "State the time is currently in"
        },
        {
            "name": "tags",
            "type": [
                "null",
                {
                    "type": "map",
                    "values": "string"
                }
            ],
            "doc": "Optional map containing session time specific information: key – unique name of the specific property; value – value of that property",
            "default": null
        },
        {
            "name": "timestamp",
            "type": [
                "null",
                "long"
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
            "default": null,
            "logicalType": "timestamp-millis"
        },
        {
            "name": "simulationTime",
            "type": [
                "null",
                "long"
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the fictive date and time the session should run with",
            "default": null,
            "logicalType": "timestamp-millis"
        },
        {
            "name": "simulationSpeed",
            "type": [
                "null",
                "float"
            ],
            "doc": "Optional speed factor this session wants to run a simulation. The range of this speed factor is [0, infinity)",
            "default": null
        }
    ]
}

erikvullings avatar Feb 20 '24 16:02 erikvullings

Hello, I have actually opened an issue on their Jira: https://issues.apache.org/jira/browse/AVRO-3928 As far as I know @martin-g fixed it already, now we just have to wait for a new release of the Rust SDK.

lerouxrgd avatar Feb 20 '24 16:02 lerouxrgd

Yes! This must be fixed in main branch! Please test it before we make a release!

On Tue, 20 Feb 2024 at 18:28, Romain Leroux @.***> wrote:

Hello, I have actually opened an issue on their Jira: https://issues.apache.org/jira/browse/AVRO-3928 As far as I know @martin-g https://github.com/martin-g fixed it already, now we just have to wait for a new release of the Rust SDK.

— Reply to this email directly, view it on GitHub https://github.com/lerouxrgd/rsgen-avro/issues/61#issuecomment-1954590169, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABYUQSWSZ2IDGNPI525TJ3YUTFMXAVCNFSM6AAAAABB3KLDUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGU4TAMJWHE . You are receiving this because you were mentioned.Message ID: @.***>

martin-g avatar Feb 20 '24 18:02 martin-g

@martin-g Do you expect that this fix will also solve my issue?

erikvullings avatar Feb 21 '24 10:02 erikvullings

Please test your use case against main branch and report an issue if it does not work! The easiest way is to use git dependency! Or clone the repo and test it as a unit test.

martin-g avatar Feb 21 '24 10:02 martin-g

I have tried to test it by updating Cargo.toml as follows:

[dependencies]
apache-avro = { version = "0.17", features = ["derive"] }

[patch.crates-io]
apache-avro = { git = "https://github.com/apache/avro.git" }

However there are some breaking changes to figure out first.

lerouxrgd avatar Feb 23 '24 21:02 lerouxrgd

Giving it a quick try, it looks like BigDecimal is not re-exported as part of the public API (contrarily to say apache_avro::Duration) , I think it should be re-exported so that consumers can actually match against it etc.

lerouxrgd avatar Feb 23 '24 22:02 lerouxrgd

@martin-g If BigDecimal gets re-exposed through apache_avro::BigDecimal then everything should be fine. For now I have pushed my fixes to the branch apache-avro-0.17, I will merge it into master when apache-avro gets updated.

@erikvullings As for your example it now generates the following Rust code:


/// Initialization – preparing for the actual start of the simulation time; Started – the simulation time is started; Paused – the simulation time is paused; Stopped – the simulation time is stopped; Reset – the simulation time is reset
#[derive(Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Clone, serde::Deserialize, serde::Serialize)]
pub enum TimeState {
    Initialization,
    Started,
    Paused,
    Stopped,
    Reset,
}

/// The time management message can be used for informing connected applications on time progression and changes. *Copyright (C) 2019-2020 XVR Simulation B.V., Delft, The Netherlands, Martijn Hendriks <hendriks @ xvrsim.com>. This file is licensed under the MIT license : https://github.com/DRIVER-EU/avro-schemas/blob/master/LICENSE*
#[derive(Debug, PartialEq, Clone, serde::Deserialize, serde::Serialize)]
pub struct TimeManagement {
    pub state: TimeState,
    #[serde(default = "default_timemanagement_tags")]
    pub tags: Option<::std::collections::HashMap<String, String>>,
    #[serde(default = "default_timemanagement_timestamp")]
    pub timestamp: Option<i64>,
    #[serde(rename = "simulationTime")]
    #[serde(default = "default_timemanagement_simulation_time")]
    pub simulation_time: Option<i64>,
    #[serde(rename = "simulationSpeed")]
    #[serde(default = "default_timemanagement_simulation_speed")]
    pub simulation_speed: Option<f32>,
}

#[inline(always)]
fn default_timemanagement_tags() -> Option<::std::collections::HashMap<String, String>> { None }

#[inline(always)]
fn default_timemanagement_timestamp() -> Option<i64> { None }

#[inline(always)]
fn default_timemanagement_simulation_time() -> Option<i64> { None }

#[inline(always)]
fn default_timemanagement_simulation_speed() -> Option<f32> { None }

Does that look good to you ?

lerouxrgd avatar Feb 24 '24 18:02 lerouxrgd

By the way, I think that the timestamp fields should be defined as:

        {
            "name": "timestamp",
            "type": [
                "null",
                {"type": "long", "logicalType": "timestamp-millis"}
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
            "default": null
        }

Instead of what you currently have, that is:

        {
            "name": "timestamp",
            "type": [
                "null",
                "long"
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
            "default": null,
            "logicalType": "timestamp-millis"
        }

lerouxrgd avatar Feb 25 '24 02:02 lerouxrgd

@lerouxrgd Is there a failing test in branch apache-avro-0.17 ? The build and tests pass here. Could you please add the problematic code as a test case, so I can use it to verify the fix in apache_avro! Thanks!

martin-g avatar Feb 26 '24 08:02 martin-g

I am trying with the following schema:

{
  "type": "record",
  "name": "Decimals",
  "fields": [ {
    "name": "a_decimal",
    "type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
  }, {
    "name": "a_big_decimal",
    "type": ["null", {"type": "bytes", "logicalType": "big-decimal"}],
    "default": null
  } ]
}

Which for now generates the following Rust code:

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
    pub a_decimal: apache_avro::Decimal,
    #[serde(default = "default_decimals_a_big_decimal")]
    pub a_big_decimal: Option<apache_avro::BigDecimal>,
}

#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }

However there are a few things that prevent it from compiling:

  • As mentioned earlier apache_avro::BigDecimal does not exist
  • apache_avro::Decimal is not Eq, however the underlying num_bigint::BigInt is Eq
  • Surprisingly, to me at least, apache_avro::Decimal is not Serialize nor Deserialize

lerouxrgd avatar Feb 26 '24 11:02 lerouxrgd

What do you think about adding examples/ to rsgen-avro that uses a build.rs to generate the Rust types and a dummy main.rs that just uses them ? It could be used by the rsgen-avro users as a blueprint how to use it as a library and it will expose such kind of problems. I could provide a PR if you like the idea!

martin-g avatar Feb 26 '24 12:02 martin-g

I am not sure whether examples/ could use a build.rs ... But if it doesn't then we could add an internal/private sub-crate instead.

martin-g avatar Feb 26 '24 12:02 martin-g

That's pretty much what is done in the tests/schemas dir already. Currently I added a test there as follows:

diff --git a/tests/schemas/decimals.avsc b/tests/schemas/decimals.avsc
new file mode 100644
index 0000000..d3c09a5
--- /dev/null
+++ b/tests/schemas/decimals.avsc
@@ -0,0 +1,12 @@
+{
+  "type": "record",
+  "name": "Decimals",
+  "fields": [ {
+    "name": "a_decimal",
+    "type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+  }, {
+    "name": "a_big_decimal",
+    "type": ["null", {"type": "bytes", "logicalType": "big-decimal"}],
+    "default": null
+  } ]
+}
diff --git a/tests/schemas/decimals.rs b/tests/schemas/decimals.rs
new file mode 100644
index 0000000..974d3c4
--- /dev/null
+++ b/tests/schemas/decimals.rs
@@ -0,0 +1,10 @@
+
+#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
+pub struct Decimals {
+    pub a_decimal: apache_avro::Decimal,
+    #[serde(default = "default_decimals_a_big_decimal")]
+    pub a_big_decimal: Option<apache_avro::BigDecimal>,
+}
+
+#[inline(always)]
+fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
\ No newline at end of file
diff --git a/tests/schemas/mod.rs b/tests/schemas/mod.rs
index a7a6ee8..7767d34 100644
--- a/tests/schemas/mod.rs
+++ b/tests/schemas/mod.rs
@@ -1,4 +1,5 @@
 pub mod complex;
+pub mod decimals;
 pub mod enums;
 pub mod enums_casing;
 pub mod enums_multiline_doc;

lerouxrgd avatar Feb 26 '24 12:02 lerouxrgd

OK! Thanks! I will take a deeper look tomorrow!

If you want you can push this failing test to apache-avro-0.17 branch! Otherwise I will apply it locally to test my fixes!

martin-g avatar Feb 26 '24 12:02 martin-g

Sure, I have force pushed it to apache-avro-0.17 branch.

lerouxrgd avatar Feb 26 '24 13:02 lerouxrgd

BigDecimal re-export:

  • https://issues.apache.org/jira/browse/AVRO-3948
  • https://github.com/apache/avro/pull/2771

martin-g avatar Feb 27 '24 09:02 martin-g

Decimal serde and Eq:

  • https://issues.apache.org/jira/browse/AVRO-3949
  • https://github.com/apache/avro/pull/2772

martin-g avatar Feb 27 '24 09:02 martin-g

I think I fixed all issues! But now it seems the new test has its own problem:


failures:
    gen_decimals

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 27 filtered out; finished in 0.20s


--- STDERR:              rsgen-avro::generation gen_decimals ---
thread 'gen_decimals' panicked at tests/generation.rs:19:5:
assertion failed: `(left == right)`: 

>>>>>>>>>>>>>>>>> Expected: 

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
    pub a_decimal: apache_avro::Decimal,
    #[serde(default = "default_decimals_a_big_decimal")]
    pub a_big_decimal: Option<apache_avro::BigDecimal>,
}

#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
>>>>>>>>>>>>>>>>> But generated: 

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
    pub a_decimal: apache_avro::Decimal,
    #[serde(default = "default_decimals_a_big_decimal")]
    pub a_big_decimal: Option<apache_avro::BigDecimal>,
}

#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }


Diff < left / right > :
 
 #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
 pub struct Decimals {
     pub a_decimal: apache_avro::Decimal,
     #[serde(default = "default_decimals_a_big_decimal")]
     pub a_big_decimal: Option<apache_avro::BigDecimal>,
 }
 
 #[inline(always)]
 fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
>

martin-g avatar Feb 27 '24 10:02 martin-g

Thanks for the fixes @martin-g ! It all looks good now (I've fixed the extra \n in my unit test).

Besides, I was thinking that maybe apache-avro should re-export all the types it uses it its public API, which means also apache_avro::Uuid, what do you think ?

lerouxrgd avatar Feb 27 '24 19:02 lerouxrgd

I think it makes a perfect sense! Let me do it!

martin-g avatar Feb 28 '24 08:02 martin-g

Done!

What about the time/date/duration types ? Currently they are backed by plain integers but maybe we should use Instant/Duration/unix_ts types ?!

martin-g avatar Feb 28 '24 09:02 martin-g

Awesome, thanks ! Actually I think all other types are re-exported already, I mostly looked at the types used in apache_avro::types::Value. So it all looks good to me.

lerouxrgd avatar Feb 28 '24 13:02 lerouxrgd

@lerouxrgd Thanks for the feedback on my schema! Indeed, when rewriting the schema as you suggested (which also made complete sense to me), it already works with the current rsgen-avro implementation. So that's great. Thank you and @martin-g for helping me out! Really appreciated.

erikvullings avatar Mar 04 '24 09:03 erikvullings

Done!

What about the time/date/duration types ? Currently they are backed by plain integers but maybe we should use Instant/Duration/unix_ts types ?!

@martin-g Actually I could think of one more re-export that would be useful, that is serde_bytes.

In #38 we discussed using serde_bytes for Value::Fixed as it is already done for Value::Bytes (following GH-36). It was blocked on serde-rs/bytes#26 but now it is merged so we can implement it.

Ideally rsgen-avro would use it as #[serde(with = "apache_avro::serde_bytes")] instead of the current #[serde(with = "serde_bytes")]. What do you think about it ?

lerouxrgd avatar Mar 24 '24 20:03 lerouxrgd

I will take a look soon!

martin-g avatar Mar 26 '24 07:03 martin-g

Ideally rsgen-avro would use it as #[serde(with = "apache_avro::serde_bytes")] instead of the current #[serde(with = "serde_bytes")]. What do you think about it ?

Looks good! But I think there is more work to add proper support for serde_bytes to apache_avro. As you may have seen https://github.com/apache/avro/pull/1900 is not finished ... I don't remember now what was the actual issue with deserialization... I will need to refresh my memory but I won't have time for it in the next week or so. Any help is welcome!

martin-g avatar Mar 28 '24 14:03 martin-g

With release 0.14.0 the following have been fixed:

  • [x] Fix date example parsing (fixed in apache-avro)
  • [x] Support for BigDecimal (re-exported from apache-avro)
  • [x] Support for Uuid (re-exported from apache-avro)
  • [x] Better fixed/byte values support (through apache-avro specialized serde functions)

lerouxrgd avatar Aug 09 '24 12:08 lerouxrgd