rsgen-avro
rsgen-avro copied to clipboard
Avro 0.16 wrongly fails on date examples
Interestingly 0.16 cannot parse this schema anymore:
{
"type": "record",
"name": "DateLogicalType",
"fields": [ {
"name": "birthday",
"type": {"type": "int", "logicalType": "date"},
"default": 1681601653
} ]
}
It says:
default's value type of field "birthday" in "DateLogicalType" must be "{\"type\":\"int\",\"logicalType\":\"date\"}"
However this one is fine:
{
"type": "record",
"name": "DateLogicalType",
"fields": [ {
"name": "release_datetime_micro",
"type": {"type": "long", "logicalType": "timestamp-micros"},
"default": 1570903062000000
} ]
}
@martin-g Do you know where I should report this ?
Originally posted by @lerouxrgd in https://github.com/lerouxrgd/rsgen-avro/issues/60#issuecomment-1884938904
@lerouxrgd Indeed, it seems that the apache-avro project on GitHub does not use the issue tracking system of GitHub. After some clicking around, it seems that all issues are managed in Jira by the Apache Software Foundation ASF, which can be found here. At the top of that page, you can find a self serve sign up link. After you have an account, I hope you can submit that issue there.
I've requested an account, but it seems to take some time...
FYI I had a similar issue as the one you've described. Below schema, although fine in Kafka, could not be parsed in rsgen-avro
. The culprit is the logicalType
, presumably because the field can also be null
.
{
"type": "record",
"name": "TimeManagement",
"namespace": "eu.driver.model.sim.config",
"doc": "The time management message can be used for informing connected applications on time progression and changes. *Copyright (C) 2019-2020 XVR Simulation B.V., Delft, The Netherlands, Martijn Hendriks <hendriks @ xvrsim.com>. This file is licensed under the MIT license : https://github.com/DRIVER-EU/avro-schemas/blob/master/LICENSE*",
"fields": [
{
"name": "state",
"type": {
"type": "enum",
"name": "TimeState",
"doc": "Initialization – preparing for the actual start of the simulation time; Started – the simulation time is started; Paused – the simulation time is paused; Stopped – the simulation time is stopped; Reset – the simulation time is reset",
"symbols": [
"Initialization",
"Started",
"Paused",
"Stopped",
"Reset"
]
},
"doc": "State the time is currently in"
},
{
"name": "tags",
"type": [
"null",
{
"type": "map",
"values": "string"
}
],
"doc": "Optional map containing session time specific information: key – unique name of the specific property; value – value of that property",
"default": null
},
{
"name": "timestamp",
"type": [
"null",
"long"
],
"doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
"default": null,
"logicalType": "timestamp-millis"
},
{
"name": "simulationTime",
"type": [
"null",
"long"
],
"doc": "Optional UNIX Epoch time in milliseconds marking the fictive date and time the session should run with",
"default": null,
"logicalType": "timestamp-millis"
},
{
"name": "simulationSpeed",
"type": [
"null",
"float"
],
"doc": "Optional speed factor this session wants to run a simulation. The range of this speed factor is [0, infinity)",
"default": null
}
]
}
Hello, I have actually opened an issue on their Jira: https://issues.apache.org/jira/browse/AVRO-3928 As far as I know @martin-g fixed it already, now we just have to wait for a new release of the Rust SDK.
Yes! This must be fixed in main branch! Please test it before we make a release!
On Tue, 20 Feb 2024 at 18:28, Romain Leroux @.***> wrote:
Hello, I have actually opened an issue on their Jira: https://issues.apache.org/jira/browse/AVRO-3928 As far as I know @martin-g https://github.com/martin-g fixed it already, now we just have to wait for a new release of the Rust SDK.
— Reply to this email directly, view it on GitHub https://github.com/lerouxrgd/rsgen-avro/issues/61#issuecomment-1954590169, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABYUQSWSZ2IDGNPI525TJ3YUTFMXAVCNFSM6AAAAABB3KLDUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGU4TAMJWHE . You are receiving this because you were mentioned.Message ID: @.***>
@martin-g Do you expect that this fix will also solve my issue?
Please test your use case against main
branch and report an issue if it does not work!
The easiest way is to use git
dependency!
Or clone the repo and test it as a unit test
.
I have tried to test it by updating Cargo.toml
as follows:
[dependencies]
apache-avro = { version = "0.17", features = ["derive"] }
[patch.crates-io]
apache-avro = { git = "https://github.com/apache/avro.git" }
However there are some breaking changes to figure out first.
Giving it a quick try, it looks like BigDecimal
is not re-exported as part of the public API (contrarily to say apache_avro::Duration
) , I think it should be re-exported so that consumers can actually match against it etc.
@martin-g If BigDecimal
gets re-exposed through apache_avro::BigDecimal
then everything should be fine. For now I have pushed my fixes to the branch apache-avro-0.17, I will merge it into master when apache-avro
gets updated.
@erikvullings As for your example it now generates the following Rust code:
/// Initialization – preparing for the actual start of the simulation time; Started – the simulation time is started; Paused – the simulation time is paused; Stopped – the simulation time is stopped; Reset – the simulation time is reset
#[derive(Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Clone, serde::Deserialize, serde::Serialize)]
pub enum TimeState {
Initialization,
Started,
Paused,
Stopped,
Reset,
}
/// The time management message can be used for informing connected applications on time progression and changes. *Copyright (C) 2019-2020 XVR Simulation B.V., Delft, The Netherlands, Martijn Hendriks <hendriks @ xvrsim.com>. This file is licensed under the MIT license : https://github.com/DRIVER-EU/avro-schemas/blob/master/LICENSE*
#[derive(Debug, PartialEq, Clone, serde::Deserialize, serde::Serialize)]
pub struct TimeManagement {
pub state: TimeState,
#[serde(default = "default_timemanagement_tags")]
pub tags: Option<::std::collections::HashMap<String, String>>,
#[serde(default = "default_timemanagement_timestamp")]
pub timestamp: Option<i64>,
#[serde(rename = "simulationTime")]
#[serde(default = "default_timemanagement_simulation_time")]
pub simulation_time: Option<i64>,
#[serde(rename = "simulationSpeed")]
#[serde(default = "default_timemanagement_simulation_speed")]
pub simulation_speed: Option<f32>,
}
#[inline(always)]
fn default_timemanagement_tags() -> Option<::std::collections::HashMap<String, String>> { None }
#[inline(always)]
fn default_timemanagement_timestamp() -> Option<i64> { None }
#[inline(always)]
fn default_timemanagement_simulation_time() -> Option<i64> { None }
#[inline(always)]
fn default_timemanagement_simulation_speed() -> Option<f32> { None }
Does that look good to you ?
By the way, I think that the timestamp fields should be defined as:
{
"name": "timestamp",
"type": [
"null",
{"type": "long", "logicalType": "timestamp-millis"}
],
"doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
"default": null
}
Instead of what you currently have, that is:
{
"name": "timestamp",
"type": [
"null",
"long"
],
"doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
"default": null,
"logicalType": "timestamp-millis"
}
@lerouxrgd Is there a failing test in branch apache-avro-0.17
? The build and tests pass here.
Could you please add the problematic code as a test case, so I can use it to verify the fix in apache_avro
! Thanks!
I am trying with the following schema:
{
"type": "record",
"name": "Decimals",
"fields": [ {
"name": "a_decimal",
"type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
}, {
"name": "a_big_decimal",
"type": ["null", {"type": "bytes", "logicalType": "big-decimal"}],
"default": null
} ]
}
Which for now generates the following Rust code:
#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
pub a_decimal: apache_avro::Decimal,
#[serde(default = "default_decimals_a_big_decimal")]
pub a_big_decimal: Option<apache_avro::BigDecimal>,
}
#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
However there are a few things that prevent it from compiling:
- As mentioned earlier
apache_avro::BigDecimal
does not exist -
apache_avro::Decimal
is notEq
, however the underlyingnum_bigint::BigInt
isEq
- Surprisingly, to me at least,
apache_avro::Decimal
is notSerialize
norDeserialize
What do you think about adding examples/
to rsgen-avro that uses a build.rs
to generate the Rust types and a dummy main.rs that just use
s them ?
It could be used by the rsgen-avro users as a blueprint how to use it as a library and it will expose such kind of problems.
I could provide a PR if you like the idea!
I am not sure whether examples/
could use a build.rs
... But if it doesn't then we could add an internal/private sub-crate instead.
That's pretty much what is done in the tests/schemas
dir already. Currently I added a test there as follows:
diff --git a/tests/schemas/decimals.avsc b/tests/schemas/decimals.avsc
new file mode 100644
index 0000000..d3c09a5
--- /dev/null
+++ b/tests/schemas/decimals.avsc
@@ -0,0 +1,12 @@
+{
+ "type": "record",
+ "name": "Decimals",
+ "fields": [ {
+ "name": "a_decimal",
+ "type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+ }, {
+ "name": "a_big_decimal",
+ "type": ["null", {"type": "bytes", "logicalType": "big-decimal"}],
+ "default": null
+ } ]
+}
diff --git a/tests/schemas/decimals.rs b/tests/schemas/decimals.rs
new file mode 100644
index 0000000..974d3c4
--- /dev/null
+++ b/tests/schemas/decimals.rs
@@ -0,0 +1,10 @@
+
+#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
+pub struct Decimals {
+ pub a_decimal: apache_avro::Decimal,
+ #[serde(default = "default_decimals_a_big_decimal")]
+ pub a_big_decimal: Option<apache_avro::BigDecimal>,
+}
+
+#[inline(always)]
+fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
\ No newline at end of file
diff --git a/tests/schemas/mod.rs b/tests/schemas/mod.rs
index a7a6ee8..7767d34 100644
--- a/tests/schemas/mod.rs
+++ b/tests/schemas/mod.rs
@@ -1,4 +1,5 @@
pub mod complex;
+pub mod decimals;
pub mod enums;
pub mod enums_casing;
pub mod enums_multiline_doc;
OK! Thanks! I will take a deeper look tomorrow!
If you want you can push this failing test to apache-avro-0.17
branch! Otherwise I will apply it locally to test my fixes!
Sure, I have force pushed it to apache-avro-0.17
branch.
BigDecimal re-export:
- https://issues.apache.org/jira/browse/AVRO-3948
- https://github.com/apache/avro/pull/2771
Decimal serde and Eq
:
- https://issues.apache.org/jira/browse/AVRO-3949
- https://github.com/apache/avro/pull/2772
I think I fixed all issues! But now it seems the new test has its own problem:
failures:
gen_decimals
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 27 filtered out; finished in 0.20s
--- STDERR: rsgen-avro::generation gen_decimals ---
thread 'gen_decimals' panicked at tests/generation.rs:19:5:
assertion failed: `(left == right)`:
>>>>>>>>>>>>>>>>> Expected:
#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
pub a_decimal: apache_avro::Decimal,
#[serde(default = "default_decimals_a_big_decimal")]
pub a_big_decimal: Option<apache_avro::BigDecimal>,
}
#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
>>>>>>>>>>>>>>>>> But generated:
#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
pub a_decimal: apache_avro::Decimal,
#[serde(default = "default_decimals_a_big_decimal")]
pub a_big_decimal: Option<apache_avro::BigDecimal>,
}
#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
Diff < left / right > :
#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
pub a_decimal: apache_avro::Decimal,
#[serde(default = "default_decimals_a_big_decimal")]
pub a_big_decimal: Option<apache_avro::BigDecimal>,
}
#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
>
Thanks for the fixes @martin-g ! It all looks good now (I've fixed the extra \n in my unit test).
Besides, I was thinking that maybe apache-avro
should re-export all the types it uses it its public API, which means also apache_avro::Uuid
, what do you think ?
I think it makes a perfect sense! Let me do it!
Done!
What about the time/date/duration types ? Currently they are backed by plain integers but maybe we should use Instant/Duration/unix_ts types ?!
Awesome, thanks ! Actually I think all other types are re-exported already, I mostly looked at the types used in apache_avro::types::Value. So it all looks good to me.
@lerouxrgd Thanks for the feedback on my schema! Indeed, when rewriting the schema as you suggested (which also made complete sense to me), it already works with the current rsgen-avro
implementation. So that's great. Thank you and @martin-g for helping me out! Really appreciated.
Done!
What about the time/date/duration types ? Currently they are backed by plain integers but maybe we should use Instant/Duration/unix_ts types ?!
@martin-g Actually I could think of one more re-export that would be useful, that is serde_bytes
.
In #38 we discussed using serde_bytes
for Value::Fixed
as it is already done for Value::Bytes
(following GH-36). It was blocked on serde-rs/bytes#26 but now it is merged so we can implement it.
Ideally rsgen-avro
would use it as #[serde(with = "apache_avro::serde_bytes")]
instead of the current #[serde(with = "serde_bytes")]
. What do you think about it ?
I will take a look soon!
Ideally
rsgen-avro
would use it as#[serde(with = "apache_avro::serde_bytes")]
instead of the current#[serde(with = "serde_bytes")]
. What do you think about it ?
Looks good! But I think there is more work to add proper support for serde_bytes to apache_avro. As you may have seen https://github.com/apache/avro/pull/1900 is not finished ... I don't remember now what was the actual issue with deserialization... I will need to refresh my memory but I won't have time for it in the next week or so. Any help is welcome!
With release 0.14.0
the following have been fixed:
- [x] Fix date example parsing (fixed in
apache-avro
) - [x] Support for
BigDecimal
(re-exported fromapache-avro
) - [x] Support for
Uuid
(re-exported fromapache-avro
) - [x] Better fixed/byte values support (through
apache-avro
specialized serde functions)