rust-csv icon indicating copy to clipboard operation
rust-csv copied to clipboard

support for writing a header for nested structs

Open ilya-epifanov opened this issue 4 years ago • 9 comments

This covers some of the #98 and #155 use cases. It makes #[serde(flatten)] unnecessary for nested structs, as CSV is a flat format anyway.

ilya-epifanov avatar Apr 15 '20 11:04 ilya-epifanov

@fmorency, @Kampfkarren, could you please take a look if this covers your use cases?

ilya-epifanov avatar Apr 15 '20 11:04 ilya-epifanov

In particular, take a look at this test: https://github.com/BurntSushi/rust-csv/blob/a2f7a8d0ef9e2a020b781b0b0e4942f3ae7fd04a/src/serializer.rs#L1185

ilya-epifanov avatar Apr 15 '20 11:04 ilya-epifanov

@BurntSushi could you please take a look?

ilya-epifanov avatar Apr 27 '20 20:04 ilya-epifanov

I've tried to see if this works with my use case, which is nested structs and wishing to write the headers for them, and it didn't.

CGMossa avatar Jun 21 '20 20:06 CGMossa

The patch works for my simple use case, two nested structs. It would help me if the PR got merged.

jeremias-blendin-intel avatar Jun 23 '20 00:06 jeremias-blendin-intel

Would help here also for flattening stuff which I don't want to patch in one big struct.

0xpr03 avatar Jun 28 '20 13:06 0xpr03

This branch works fine for me for serializing nested structs. Thanks a lot @ilya-epifanov!

Luthaf avatar Nov 26 '20 09:11 Luthaf

Thank you so much @ilya-epifanov. This however fails when serializing nested structs wrapped in enums.

use serde::Serialize;

#[derive(Serialize)]
struct Row {
    nested_one: Option<Nested>,
    nested_two: Nested,
}

#[derive(Serialize)]
struct Nested { x: i32, y: i32 }

fn main() {
    let mut writer = csv::Writer::from_writer(std::io::stdout());
    let row = Row { nested_one: Some(Nested { x: 1, y: 2 }), nested_two: Nested { x: 3, y: 4}};
    writer.serialize(row).unwrap();
}

This panics with Error(UnequalLengths { pos: None, expected_len: 3, len: 4 }) after writing:

nested_one,x,y
1,2,3,4

amrhassan avatar Dec 09 '20 09:12 amrhassan

Ok, @BurntSushi, this is what I got.

Summary

There is 2 main step in writing the CSV: - Writing the header and - Writing the values.

Currently the values are written properly even when they are nested in a struct. On the other hand, the header for a struct is written as the field name of the struct and not the nested field of the struct.

That's the problem this PR tries to solve and that is why this PR mainly modifies the behaviour of struct SeHeader.

How it is implemented

Basically, this PR hack the serialize_field by making the writing of the header field conditional to the type of the field. It would behave as before for any type that is not a struct, but, if it is a struct, it is not going to write it down. The state will move down its nested field and everything will work.

In order to check the type, the PR has set a new field for the struct SeHeader. This new field is called structs_written (Line 451, 456-459). It is a counter of the number of struct it has seen since the beginning of the writting of the header. This counter is incremented each time serialize_struct is called.

Therefore, if the structs_written counter changes when encountering a new field that means the new field is a struct(Line 802-805).

Problem

This however does not work when the struct is encapsulated in an option as @amrhassan pointed out.

The underlying problem is that the hack does not work this time as serialize_struct is not called when the struct is in an option. Instead serialize_some is called.

Now we could try to patch serialize_some as well and increment structs_written but i did not find a patch for it, and it will look a bit shaky.

A second problem I faced is that we cannot use the #[serde(flatten)], otherwise the writting will fail with serializing maps is not supported ... called in serialize_map. This is kind of counter-intuitive as it is the recommended way to use serde serialization.

I think it will be better to redo this PR for nesting by implementing the serialize_map method with some conditioning on the input and use serde #[serde(flatten)].

haixuanTao avatar Mar 01 '21 20:03 haixuanTao