flatbuffers
flatbuffers copied to clipboard
Rust: Implement Vectors of Unions
Reach parity with C++, Java, etc.
This issue has been automatically marked as stale because it has not had activity for 1 year. It will be automatically closed if no further activity occurs. To keep it open, simply post a new comment. Maintainers will re-open on new activity. Thank you for your contributions.
This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.
Is this feature finished? I have been learning Rust recently, maybe I can contribute.
Nope, I don't think so. Please do! I can do code review
I am looking for this feature as well. This should not be closed. Can this be reopened?
Hi, there hasn't been any movement regarding this issue, so I decided to give it a go. I have got initial implementation that I have tested serialising in a Rust client and deserialising in a C++ client and vice versa. I haven't implemented a Verifier for a new type yet and I am hoping to commit some tests some time this week.
Hi @CasperN, @rw,
Implementing a Verifier
for vectors of unions will be a bit tricky (as it is for a single union fields) and it seems I cannot avoid changing Rust's flatbuffers runtime lib.
I will probably add some method like visit_vector_of_unions
to a TableVerifier
implementation and hopefully I will be able to reuse existing visit_union
.
Before I progress, I would like to make sure that APIs I added make sense. Would you be able to verify my existing code, please?
In particular I have changed:
@@ -1032,7 +1044,7 @@ class RustGenerator : public BaseGenerator {
return WrapUOffsetsVector("&" + lifetime + " str");
}
case ftVectorOfUnionValue: {
- return WrapUOffsetsVector("flatbuffers::Table<" + lifetime + ">");
+ return WrapUOffsetsVector("flatbuffers::UnionWIPOffset");
}
}
return "INVALID_CODE_GENERATION"; // for return analysis
so that it is consitent with a single union field scenario.
If you would like to see usage, please see my little demo repo kajdam/flatbuffers-rust-vector-of-unions-example
Once confirmed that APIs for my new stuff look OK, I will take care of Verifier
implementation
Just for reference and if it would help. I implemented it in the swift version of the verifier (not merged yet), which is relatively based on the rust verifier. here are some links that you might find useful. It was the easiest approach I found to verify vector of unions in UnionVector. This is called from a function similar to what you already made called visitUnionVector in TableVerifier.
I think the union value type just has to match the underlying table's as_union_value
method for building to work correctly. So long as there aren't "significant visible changes" to the tests and samples, feel free to change the generated code.
Also, I agree that you'll have to implement a method like visit_vector_of_unions
since visit_field
assumes the type is one field, which isn't true with unions.
Matching single unions makes sense to me, however I do have a big picture concern: At risk of expanding the scope, the union implementation could use some work to be statically safer. The compiler can't stop you from giving the wrong discriminant or even passing a table unrelated to the union into the union field! If we want to rework the whole thing, we should design for both unions and vectors of unions.
I think @rw started work in this direction but never finished. We do create unique offsets for our unions
https://github.com/google/flatbuffers/blob/ffc2ef77caeb6b04e89db952c65216982db44698/samples/monster_generated.rs#L200
yet we seem to use flatbuffers::UnionWIPOffset
everywhere instead
Agreed. The fact that unions' discriminants are pushed to the table independently is far from ideal.
Native object API and pack()
comes to the rescue, but incurs a performance hit.
Shall we handle vectors of unions and a new API for determinants as a separate more manageable chunks of work or expand the scope of this issue?
Shall we handle vectors of unions and a new API for determinants as a separate more manageable chunks of work or expand the scope of this issue?
Its up to you. I'm fine with doing it in one big PR since there may be overlapping concerns and design choices.
Cool, I have had a look at it and got some ideas how to build a type safety around it. On it.
The feature itself is ready, but things to note are:
- it breaks backward compatibility due to introduction of typed union offsets. API for vector of unions is new, but affects how single union instance is used
-
UnionWIPOffset
as well asWIPOffset<T>::as_union()
become obsolete and should be removed altogether - code generation might need some fixing on the go to make sure namespaces are correct - but hopefully tests will pick up all of those
- tests are failing because they have not been updated -
.rs.
files should be generated again from.fbs
schemas
Regarding the last one, how test files are generated?
Is running generate_code.sh
followed by RustTest.sh
enough? Is there anything I should be aware of?
it breaks backward compatibility due to introduction of typed union offsets. API for vector of unions is new, but affects how single union instance is used
Should be fine, hopefully its only broken code that's broken, but we are relatively unstable anyway.
Is running generate_code.sh followed by RustTest.sh enough?
Yep, and of course make flatc
before that
All existing tests are passing now. Test cases for vector of unions are yet to be be added.
I also need to spend some time to think what would be the best way to handle union aliases with ambiguous types. I wanted a discriminant to be associated with an underlying variant type, but that falls short when there is ambiguity.
https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test.fbs#L37
https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test_generated.rs#L992-L1000
In above case only one function is generated to avoid multiple definition and it always returns the first discriminant.
As opposed to a case with unique types where things work really nice
https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test.fbs#L36
https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test_generated.rs#L748-L772
First match seems kind of arbitrary, we should just generate more constructors and have users pick the right one. So, I don't think from_value_offset
is the right idea
something like
impl AnyUniqueAliases {
fn create_m1(m: WIPOffset<Monster>) -> AnyUniqueAliasesUnionTableOffset;
fn create_m2(m: WIPOffset<Monster>) -> AnyUniqueAliasesUnionTableOffset;
fn create_m3(m: WIPOffset<Monster>) -> AnyUniqueAliasesUnionTableOffset;
unsafe fn create_unknown_variant(tag: AnyAmbiguousAliases, table: WIPOffset<Table>) -> AnyUniqueAliasesUnionTableOffset;
}
should provide sufficient type safety while avoiding ambiguity
what do you think?
Yes, it is probably worth keeping it simple. I introduced that trait early on as I thought it would allow creating vectors of unions in a more generic way like here: https://github.com/kajdam/flatbuffers-rust-vector-of-unions-example/blob/e5574df7ed98785be1000ed347601a388d0dd3b4/src/main.rs#L43-L57
But the idea with generated constructors makes more sense when aliases and ambiguous types are taken into account.
Having said that I wouldn't rush adding that unsafe function to create an unknown variant unless proven absolutely necessary. It can be added later if really needed. Does it make sense? @CasperN Did you have any particular use case in mind?
Did you have any particular use case in mind?
No, nothing good. Feel free to disregard it.
btw, it seems like there'd be an analogous API for vector of unions
// minor aside, I'd like to move things to associated types where possible since its more "rust-y"
let mut b = AnyAmbiguousAliases::VectorBuilder::new(&mut fbb);
b.push_m1(monster);
b.push_m2(monster);
b.push_m3(monster);
let wip_vector: UnionVectorWipOffset<AnyAmbiguousAliases> = b.finish();
But I just realized an important constraint: Its hard write the discriminants-vector and tables-vector at the same time if the data is coming in interleaved. We may need to
- have an internal
Vec
in theVectorBuilder
which goes against our promise of zero allocations after warm up, - ask the user for the exact vector length up front so we can write,
- do some clever shuffling to dis-interleave writes,
- put the internal
Vec
in the FlatbufferBuilder so it can be reused over time
1 and 2 seem unacceptable. 3 is hard and possibly slow. I think 4 might be the way to go.
It seems solution 4. is a way to go. In a similar way as it is done for shared strings
https://github.com/google/flatbuffers/blob/4033ff5892ed6a5e083811fa13c7f9094d46d5db/rust/flatbuffers/src/builder.rs#L57
the difference being that here it would need to be Vec<u8>
to accommodate for any union discriminant.
Alternative could be storing discriminants at the front of self.owned_buf
in a builder (given that capacity for them would be needed anyway) and copying them into the right position when finish()
is called.
This gets a bit clunky when self.owned_buf
gets re-allocated, but does not require extra memory and copying has to be done anyway, regardless whether it is an external buffer or shifting within the same one.
Any thoughts?
Is moving towards associated types planned across the board? Assuming we would have a struct as follows
// somewhere in .fbs schema definition
struct ThreeBytes {
a: byte;
b: byte;
c: byte;
}
The example below will not work because initial alignment (padding) is dependent on the total number of elements.
// This will not work
let mut b = ThreeBytes::VectorBuilder::new(&mut fbb);
b.push(&struct1);
b.push(&struct2);
b.push(&struct3);
let struct_vector: flatbuffers::WIPOffset<flatbuffers::Vector<'a, ThreeBytes>> = b.finish();
This is the reason builder's start_vector
has to take the number of elements
https://github.com/google/flatbuffers/blob/4033ff5892ed6a5e083811fa13c7f9094d46d5db/rust/flatbuffers/src/builder.rs#L219
We can get away with that problem for vectors of tables, unions and strings as they are effectively vectors of offsets and have natural alignment which could be easily determined when the first element is inserted.
That would lead to two interfaces, one for vectors of items stored inline where number of elements has to be known upfront and the other one for vectors of offsets. However, given that user has to store underlying tables prior to building a vector of offsets, they likely already know the number of elements anyway. For that reason,
- ask the user for the exact vector length up front so we can write,
perhaps is still in scope.
Is moving towards associated types planned across the board?
Its just a thing that I want to do for Rust. I deprecated the ENUM_VAL
constants and made them associated constants.
The example below will not work because initial alignment (padding) is dependent on the total number of elements. ... perhaps is still in scope.
aw dang it, ok yea
This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.
Not stale.
This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.
Still not stale
This would make my schema so much simpler. Looking forward to this being implemented eventually
I got @0x4d0x4b patched library working and merged with the latest flatbuffers
version. Till now it seems to be working correctly.
What was missing in @0x4d0x4b work to be accepted?
I haven't reviewed it, but we can start the PR process