flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

Rust: Implement Vectors of Unions

Open rw opened this issue 5 years ago • 32 comments

Reach parity with C++, Java, etc.

rw avatar Nov 01 '18 18:11 rw

This issue has been automatically marked as stale because it has not had activity for 1 year. It will be automatically closed if no further activity occurs. To keep it open, simply post a new comment. Maintainers will re-open on new activity. Thank you for your contributions.

stale[bot] avatar Nov 01 '19 18:11 stale[bot]

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

github-actions[bot] avatar Jun 15 '20 20:06 github-actions[bot]

Is this feature finished? I have been learning Rust recently, maybe I can contribute.

theidexisted avatar Dec 24 '20 08:12 theidexisted

Nope, I don't think so. Please do! I can do code review

CasperN avatar Dec 29 '20 16:12 CasperN

I am looking for this feature as well. This should not be closed. Can this be reopened?

vadixidav avatar Feb 08 '21 22:02 vadixidav

Hi, there hasn't been any movement regarding this issue, so I decided to give it a go. I have got initial implementation that I have tested serialising in a Rust client and deserialising in a C++ client and vice versa. I haven't implemented a Verifier for a new type yet and I am hoping to commit some tests some time this week.

0x4d0x4b avatar Feb 21 '21 20:02 0x4d0x4b

Hi @CasperN, @rw,

Implementing a Verifier for vectors of unions will be a bit tricky (as it is for a single union fields) and it seems I cannot avoid changing Rust's flatbuffers runtime lib. I will probably add some method like visit_vector_of_unions to a TableVerifier implementation and hopefully I will be able to reuse existing visit_union.

Before I progress, I would like to make sure that APIs I added make sense. Would you be able to verify my existing code, please?

In particular I have changed:

@@ -1032,7 +1044,7 @@ class RustGenerator : public BaseGenerator {
         return WrapUOffsetsVector("&" + lifetime + " str");
       }
       case ftVectorOfUnionValue: {
-        return WrapUOffsetsVector("flatbuffers::Table<" + lifetime + ">");
+        return WrapUOffsetsVector("flatbuffers::UnionWIPOffset");
       }
     }
     return "INVALID_CODE_GENERATION";  // for return analysis

so that it is consitent with a single union field scenario.

If you would like to see usage, please see my little demo repo kajdam/flatbuffers-rust-vector-of-unions-example

Once confirmed that APIs for my new stuff look OK, I will take care of Verifier implementation

0x4d0x4b avatar Feb 23 '21 14:02 0x4d0x4b

Just for reference and if it would help. I implemented it in the swift version of the verifier (not merged yet), which is relatively based on the rust verifier. here are some links that you might find useful. It was the easiest approach I found to verify vector of unions in UnionVector. This is called from a function similar to what you already made called visitUnionVector in TableVerifier.

swift tests

mustiikhalil avatar Feb 23 '21 15:02 mustiikhalil

I think the union value type just has to match the underlying table's as_union_value method for building to work correctly. So long as there aren't "significant visible changes" to the tests and samples, feel free to change the generated code.

Also, I agree that you'll have to implement a method like visit_vector_of_unions since visit_field assumes the type is one field, which isn't true with unions.

Matching single unions makes sense to me, however I do have a big picture concern: At risk of expanding the scope, the union implementation could use some work to be statically safer. The compiler can't stop you from giving the wrong discriminant or even passing a table unrelated to the union into the union field! If we want to rework the whole thing, we should design for both unions and vectors of unions.

I think @rw started work in this direction but never finished. We do create unique offsets for our unions

https://github.com/google/flatbuffers/blob/ffc2ef77caeb6b04e89db952c65216982db44698/samples/monster_generated.rs#L200

yet we seem to use flatbuffers::UnionWIPOffset everywhere instead

CasperN avatar Feb 23 '21 15:02 CasperN

Agreed. The fact that unions' discriminants are pushed to the table independently is far from ideal. Native object API and pack() comes to the rescue, but incurs a performance hit. Shall we handle vectors of unions and a new API for determinants as a separate more manageable chunks of work or expand the scope of this issue?

0x4d0x4b avatar Feb 23 '21 16:02 0x4d0x4b

Shall we handle vectors of unions and a new API for determinants as a separate more manageable chunks of work or expand the scope of this issue?

Its up to you. I'm fine with doing it in one big PR since there may be overlapping concerns and design choices.

CasperN avatar Feb 23 '21 21:02 CasperN

Cool, I have had a look at it and got some ideas how to build a type safety around it. On it.

0x4d0x4b avatar Feb 24 '21 12:02 0x4d0x4b

The feature itself is ready, but things to note are:

  • it breaks backward compatibility due to introduction of typed union offsets. API for vector of unions is new, but affects how single union instance is used
  • UnionWIPOffset as well as WIPOffset<T>::as_union() become obsolete and should be removed altogether
  • code generation might need some fixing on the go to make sure namespaces are correct - but hopefully tests will pick up all of those
  • tests are failing because they have not been updated - .rs. files should be generated again from .fbs schemas

Regarding the last one, how test files are generated? Is running generate_code.sh followed by RustTest.sh enough? Is there anything I should be aware of?

0x4d0x4b avatar Mar 02 '21 16:03 0x4d0x4b

it breaks backward compatibility due to introduction of typed union offsets. API for vector of unions is new, but affects how single union instance is used

Should be fine, hopefully its only broken code that's broken, but we are relatively unstable anyway.

Is running generate_code.sh followed by RustTest.sh enough?

Yep, and of course make flatc before that

CasperN avatar Mar 02 '21 20:03 CasperN

All existing tests are passing now. Test cases for vector of unions are yet to be be added.

I also need to spend some time to think what would be the best way to handle union aliases with ambiguous types. I wanted a discriminant to be associated with an underlying variant type, but that falls short when there is ambiguity.

https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test.fbs#L37

https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test_generated.rs#L992-L1000

In above case only one function is generated to avoid multiple definition and it always returns the first discriminant.

As opposed to a case with unique types where things work really nice

https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test.fbs#L36

https://github.com/kajdam/flatbuffers/blob/70a34b4349f1ba7133f485ca784f7aa75d58e00f/tests/monster_test_generated.rs#L748-L772

0x4d0x4b avatar Mar 03 '21 14:03 0x4d0x4b

First match seems kind of arbitrary, we should just generate more constructors and have users pick the right one. So, I don't think from_value_offset is the right idea

something like

impl AnyUniqueAliases {
  fn create_m1(m: WIPOffset<Monster>) -> AnyUniqueAliasesUnionTableOffset;
  fn create_m2(m: WIPOffset<Monster>) -> AnyUniqueAliasesUnionTableOffset;
  fn create_m3(m: WIPOffset<Monster>) -> AnyUniqueAliasesUnionTableOffset;
  unsafe fn create_unknown_variant(tag: AnyAmbiguousAliases, table: WIPOffset<Table>) ->  AnyUniqueAliasesUnionTableOffset;
}

should provide sufficient type safety while avoiding ambiguity

what do you think?

CasperN avatar Mar 03 '21 14:03 CasperN

Yes, it is probably worth keeping it simple. I introduced that trait early on as I thought it would allow creating vectors of unions in a more generic way like here: https://github.com/kajdam/flatbuffers-rust-vector-of-unions-example/blob/e5574df7ed98785be1000ed347601a388d0dd3b4/src/main.rs#L43-L57

But the idea with generated constructors makes more sense when aliases and ambiguous types are taken into account.

0x4d0x4b avatar Mar 03 '21 16:03 0x4d0x4b

Having said that I wouldn't rush adding that unsafe function to create an unknown variant unless proven absolutely necessary. It can be added later if really needed. Does it make sense? @CasperN Did you have any particular use case in mind?

0x4d0x4b avatar Mar 03 '21 16:03 0x4d0x4b

Did you have any particular use case in mind?

No, nothing good. Feel free to disregard it.

CasperN avatar Mar 04 '21 00:03 CasperN

btw, it seems like there'd be an analogous API for vector of unions

// minor aside, I'd like to move things to associated types where possible since its more "rust-y"
let mut b = AnyAmbiguousAliases::VectorBuilder::new(&mut fbb);
b.push_m1(monster);
b.push_m2(monster);
b.push_m3(monster);
let wip_vector: UnionVectorWipOffset<AnyAmbiguousAliases> = b.finish();

But I just realized an important constraint: Its hard write the discriminants-vector and tables-vector at the same time if the data is coming in interleaved. We may need to

  1. have an internal Vec in the VectorBuilder which goes against our promise of zero allocations after warm up,
  2. ask the user for the exact vector length up front so we can write,
  3. do some clever shuffling to dis-interleave writes,
  4. put the internal Vec in the FlatbufferBuilder so it can be reused over time

1 and 2 seem unacceptable. 3 is hard and possibly slow. I think 4 might be the way to go.

CasperN avatar Mar 04 '21 18:03 CasperN

It seems solution 4. is a way to go. In a similar way as it is done for shared strings https://github.com/google/flatbuffers/blob/4033ff5892ed6a5e083811fa13c7f9094d46d5db/rust/flatbuffers/src/builder.rs#L57 the difference being that here it would need to be Vec<u8> to accommodate for any union discriminant.

Alternative could be storing discriminants at the front of self.owned_buf in a builder (given that capacity for them would be needed anyway) and copying them into the right position when finish() is called. This gets a bit clunky when self.owned_buf gets re-allocated, but does not require extra memory and copying has to be done anyway, regardless whether it is an external buffer or shifting within the same one.

Any thoughts?

0x4d0x4b avatar Mar 04 '21 21:03 0x4d0x4b

Is moving towards associated types planned across the board? Assuming we would have a struct as follows

// somewhere in .fbs schema definition
struct ThreeBytes {
    a: byte;
    b: byte;
    c: byte;
}

The example below will not work because initial alignment (padding) is dependent on the total number of elements.

// This will not work
let mut b = ThreeBytes::VectorBuilder::new(&mut fbb);
b.push(&struct1);
b.push(&struct2);
b.push(&struct3);
let struct_vector: flatbuffers::WIPOffset<flatbuffers::Vector<'a, ThreeBytes>> = b.finish();

This is the reason builder's start_vector has to take the number of elements https://github.com/google/flatbuffers/blob/4033ff5892ed6a5e083811fa13c7f9094d46d5db/rust/flatbuffers/src/builder.rs#L219

We can get away with that problem for vectors of tables, unions and strings as they are effectively vectors of offsets and have natural alignment which could be easily determined when the first element is inserted.

That would lead to two interfaces, one for vectors of items stored inline where number of elements has to be known upfront and the other one for vectors of offsets. However, given that user has to store underlying tables prior to building a vector of offsets, they likely already know the number of elements anyway. For that reason,

  1. ask the user for the exact vector length up front so we can write,

perhaps is still in scope.

0x4d0x4b avatar Mar 05 '21 07:03 0x4d0x4b

Is moving towards associated types planned across the board?

Its just a thing that I want to do for Rust. I deprecated the ENUM_VAL constants and made them associated constants.

The example below will not work because initial alignment (padding) is dependent on the total number of elements. ... perhaps is still in scope.

aw dang it, ok yea

CasperN avatar Mar 05 '21 23:03 CasperN

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

github-actions[bot] avatar Sep 04 '21 20:09 github-actions[bot]

Not stale.

vadixidav avatar Sep 05 '21 00:09 vadixidav

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

github-actions[bot] avatar Mar 06 '22 20:03 github-actions[bot]

Still not stale

agg23 avatar Mar 06 '22 21:03 agg23

This would make my schema so much simpler. Looking forward to this being implemented eventually

TheButlah avatar Apr 05 '22 09:04 TheButlah

I got @0x4d0x4b patched library working and merged with the latest flatbuffers version. Till now it seems to be working correctly.

What was missing in @0x4d0x4b work to be accepted?

graffic avatar Sep 14 '22 13:09 graffic

I haven't reviewed it, but we can start the PR process

CasperN avatar Sep 14 '22 13:09 CasperN