avro
avro copied to clipboard
AVRO-3631: [Rust] Add support for (de)serializing Rust byte arrays to Avro values
What is the purpose of the change
To make it possible to serialize Rust structs to Avro values (types::Value).
Verifying this change
This change added tests and can be verified as follows:
- new unit and integration tests have been added
Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? Rustdoc
@martin-g Couple small conflicts when merging this with AVRO-3683.
In the absence of Serde-Bytes #28, Could this be made to use the new serde-bytes-array crate and included in the release ?
This issue with fixed is blocking use of AVRO on my side.
This issue with fixed is blocking use of AVRO on my side.
If you only need deserialization ATM (because serialization it's not implemented yet in that rework), you may want to give a test to the redesign considered at https://issues.apache.org/jira/browse/AVRO-3714?focusedCommentId=17696183&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17696183, where that is supported (in addition to x10 perf improvements)
I need both serialization and deserialization, using avro-datum (no embeded schemas) for complex multifile schemas (see the recently merged AVRO-3683). Also fixed length arrays, and multi-value nullable unions. If the redesign can do all that, happy to look !
I need both serialization and deserialization, using avro-datum (no embeded schemas) for complex multifile schemas (see the recently merged AVRO-3683). Also fixed length arrays, and multi-value nullable unions. If the redesign can do all that, happy to look !
Everything you mentioned should be supported, however only on the deserialization side for (serialization is still a WIP).
- Multifile schemas because schema parsing uses that of apache-avro.
- Fixed length arrays & multi-value nullable unions as described at https://docs.rs/serde_avro_fast/0.1.0/serde_avro_fast/schema/enum.SchemaNode.html#variant.Union
- Efficient deserialization of byte arrays via
serde_bytes(in particular you can deserialize directly to &[u8] with zero alloc.)
Since you have such a complex use-case, it would likely be of great help for AVRO-3714 if you could give it a try.
@martin-g I tested this along with release version of serde-bytes and serde-byte-array.
All works fine for my use cases for Fixed arrays. +1 for merge from me.
I noticed I had to use serde-bytes for vec
noticed I had to use serde-bytes for vec<8> but serde-byte-array for [u8; 16] Assume this is correct?
It is a mess but yes :-/
Its fine. Works.
If the redesign can do all that, happy to look !
@markfarnan
Happy to say that I've just released it as serde_avro_fast 0.2.0 with serialization support (which was the only thing missing so that it would be in a state where it "can do all that"). (Also fixes basically all currently open issues of apache_avro and has x10 perf)
Please let me know if there's anything I can do to help with testing or if you encounter any issue! Looking forward to hearing from you :)
I had a quick look at this, is there more to it than simply using serde_bytes on struct fields ?
Something like this patch
diff --git a/lang/rust/Cargo.lock b/lang/rust/Cargo.lock
index 8fce3d635..572e7bdc0 100644
--- a/lang/rust/Cargo.lock
+++ b/lang/rust/Cargo.lock
@@ -92,6 +92,7 @@ dependencies = [
"regex-lite",
"rstest",
"serde",
+ "serde_bytes",
"serde_json",
"serial_test",
"sha2",
@@ -1175,6 +1176,15 @@ dependencies = [
"serde_derive",
]
+[[package]]
+name = "serde_bytes"
+version = "0.11.15"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "387cc504cb06bb40a96c8e04e951fe01854cf6bc921053c954e4a606d9675c6a"
+dependencies = [
+ "serde",
+]
+
[[package]]
name = "serde_derive"
version = "1.0.204"
diff --git a/lang/rust/Cargo.toml b/lang/rust/Cargo.toml
index a6fd2ad48..e6b8b5767 100644
--- a/lang/rust/Cargo.toml
+++ b/lang/rust/Cargo.toml
@@ -43,6 +43,7 @@ documentation = "https://docs.rs/apache-avro"
[workspace.dependencies]
log = { default-features = false, version = "0.4.22" }
serde = { default-features = false, version = "1.0.204", features = ["derive"] }
+serde_bytes = { default-features = false, version = "0.11.15", features = ["std"] }
serde_json = { default-features = false, version = "1.0.120", features = ["std"] }
[profile.release.package.hello-wasm]
diff --git a/lang/rust/avro/Cargo.toml b/lang/rust/avro/Cargo.toml
index 27728bcf6..df5874d4e 100644
--- a/lang/rust/avro/Cargo.toml
+++ b/lang/rust/avro/Cargo.toml
@@ -64,6 +64,7 @@ log = { workspace = true }
num-bigint = { default-features = false, version = "0.4.6", features = ["std", "serde"] }
regex-lite = { default-features = false, version = "0.1.6", features = ["std", "string"] }
serde = { workspace = true }
+serde_bytes = { workspace = true }
serde_json = { workspace = true }
snap = { default-features = false, version = "1.1.0", optional = true }
strum = { default-features = false, version = "0.26.3" }
diff --git a/lang/rust/avro/src/lib.rs b/lang/rust/avro/src/lib.rs
index a68240600..3c31c1584 100644
--- a/lang/rust/avro/src/lib.rs
+++ b/lang/rust/avro/src/lib.rs
@@ -872,6 +872,7 @@ pub use reader::{
};
pub use schema::{AvroSchema, Schema};
pub use ser::to_value;
+pub use serde_bytes;
pub use util::{max_allocation_bytes, set_serde_human_readable};
pub use uuid::Uuid;
pub use writer::{
diff --git a/lang/rust/avro/tests/avro-3631.rs b/lang/rust/avro/tests/avro-3631.rs
new file mode 100644
index 000000000..caa361eaf
--- /dev/null
+++ b/lang/rust/avro/tests/avro-3631.rs
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use apache_avro::types::Value;
+use apache_avro::{from_value, serde_bytes, to_value, Schema};
+use serde::{Deserialize, Serialize};
+
+#[test]
+fn avro_3631_test_serialize_fixed_fields() {
+ #[derive(Debug, Serialize, Deserialize)]
+ struct TestStructFixedField<'a> {
+ #[serde(with = "serde_bytes")]
+ bytes_field: &'a [u8],
+
+ #[serde(with = "serde_bytes")]
+ vec_field: Vec<u8>,
+
+ #[serde(with = "serde_bytes")]
+ fixed_field: [u8; 6],
+ }
+
+ let test = TestStructFixedField {
+ bytes_field: &[1, 2, 3],
+ fixed_field: [1; 6],
+ vec_field: vec![2, 3, 4],
+ };
+ let value: Value = to_value(test).unwrap();
+ let schema = Schema::parse_str(
+ r#"
+ {
+ "type": "record",
+ "name": "TestStructFixedField",
+ "fields": [
+ {
+ "name": "bytes_field",
+ "type": "bytes"
+ },
+ {
+ "name": "vec_field",
+ "type": "bytes"
+ },
+ {
+ "name": "fixed_field",
+ "type": {
+ "name": "fixed_field",
+ "type": "fixed",
+ "size": 6
+ }
+ }
+ ]
+ }
+ "#,
+ )
+ .unwrap();
+ assert!(value.validate(&schema));
+}
+
+#[test]
+fn avro_3631_deserialize_value_to_struct_with_byte_types() {
+ #[derive(Debug, Deserialize, PartialEq)]
+ struct TestStructFixedField {
+ /// borrowed slices are not supported !
+ // #[serde(with = "serde_bytes", borrow)]
+ // slice_field: &'a [u8],
+
+ /// borrowed byte arrays are not supported !
+ // #[serde(with = "serde_byte", borrow)]
+ // borrowed_fixed_field: &'a [u8; 16],
+
+ #[serde(with = "serde_bytes")]
+ vec_field: Vec<u8>,
+
+ #[serde(with = "serde_bytes")]
+ fixed_field: [u8; 6],
+ }
+
+ let expected = TestStructFixedField {
+ // slice_field: &[1, 11, 111],
+ // borrowed_fixed_field: &[2; 16],
+ vec_field: vec![3, 33],
+ fixed_field: [1; 6],
+ };
+
+ let value = Value::Record(vec![
+ // ("slice_field".to_owned(), ??? ) // needs something like Value::BytesSlice(&[u8])
+ // ("borrowed_fixed_field".to_owned(), ??? ) // needs something like Value::FixedSlice(&[u8; 6])
+ ("vec_field".to_owned(), Value::Bytes(vec![3, 33])),
+ (
+ "fixed_field".to_owned(),
+ Value::Fixed(6, Vec::from(expected.fixed_field)),
+ ),
+ ]);
+ assert_eq!(expected, from_value(&value).unwrap());
+}
@lerouxrgd Please send the patch as a Pull Request, so you get your credits! Otherwise please let me know and I will apply it manually!
@martin-g You can apply it manually ! I basically took your unit tests and just used serde_bytes on the appropriate fields so you should have the credits =)
@lerouxrgd 0.17 is very close! I may not have time to apply it before the cut! If you have time to send it as a PR I will find time to review and merge it!
@martin-g I have prepared GH-3027 for this !