rust-csv
rust-csv copied to clipboard
Write headers with serde when the collection is empty
On csv 1.1.1
When using serde
to serialize a CSV file, the header is not written in the file if Writer::serialize()
has not been called. This might seems logical but here is my problem: I'm using a standard format based on CSV which force us to write a file with the headers, even if there is no data associated.
use csv::*;
use serde::*;
#[derive(Serialize, Deserialize)]
struct Object {
id: u32,
name: &'static str,
}
fn main() {
let object = Object {
id: 1,
name: "some name",
};
let objects: Vec<Object> = vec![object]; // XXXXX
let mut writer = WriterBuilder::default()
.has_headers(true)
.from_path("foo.csv")
.unwrap();
for o in objects {
writer.serialize(o).unwrap();
}
}
The above program will output the expected file with the header.
id,name
1,some name
But, if you replace the line annotated with XXXXX
with
let objects: Vec<Object> = vec![];
The resulting file will end up completely empty but I'd like to be able to at least produce.
id,name
Obviously, if I never tell the Writer
object to do anything, nothing will happen. But I was thinking maybe a Writer::serialize_header()
could be provided? In the above piece of code, I could call writer.serialize_header()
just above the for
loop.
Note: I'm not a rockstar of Rust but I'd be fine to try to implement it if you could provide some guidance. But first, I need to be sure I didn't miss a piece of documentation that would do what I want! And also if you're ok about the possibility to add such a new function and the design of the API.
Hmm, it does seem smart to write the headers here, even if no rows are written. I'd rather not introduce an additional method to do this though. I think it would be better to just make flush
do this. So if you wanted to submit a PR, then I'd try this:
- Refactor this code into an unexported helper method.
-
Modify the
flush
method to call that helper method. The call should be right before thelet result ...
line and afterself.state.panicked = true;
. This is a little tricky since you'll need to check both results for errors. - Add a test for it.
That way, this should just happen automatically since flush
is called on Drop
.
Thanks for the quick reply and the guidance. And indeed, your proposition makes a lot of sense and will be even better. I'll take a look at the code then.
TL;DR;
Writer::flush
needs a record
to figure out the headers... and I'm not sure it makes a lot of sense to send a record
to Writer::flush()
.
Long version
So I tried to implement the solution. First I did a dumb function extraction of this function as suggested into a Writer::serialize_header()
. Something seems weird at first: the fact that I needed a record to do that (but I decided to not care for now... spoiler, it was a mistake).
impl<W: io::Write> Writer<W> {
...
fn serialize_header<S: Serialize>(&mut self, record: S) -> Result<()> {
if let HeaderState::Write = self.state.header {
let wrote_header = serialize_header(self, &record)?;
if wrote_header {
self.write_terminator()?;
self.state.header = HeaderState::DidWrite;
} else {
self.state.header = HeaderState::DidNotWrite;
};
}
Ok(())
}
...
}
And then replace the Writer::serialize()
function with the following code.
impl<W: io::Write> Writer<W> {
...
pub fn serialize<S: Serialize>(&mut self, record: S) -> Result<()> {
self.serialize_header(&record)?;
serialize(self, &record)?;
self.write_terminator()?;
Ok(())
}
...
}
This simple refactor didn't break anything so far (all the tests still passed). But I was starting to wonder "how am I gonna get a record to call Writer::serialize_header()
in Writer::flush()
function?".
I tried to write some test for the use case in question (serialize only headers), I came up with the following code (based on examples/tutorial-write-serde-02.rs
).
use std::error::Error;
use std::io;
use std::process;
use serde::Serialize;
// Note that structs can derive both Serialize and Deserialize!
#[derive(Debug, Serialize)]
#[serde(rename_all = "PascalCase")]
struct Record<'a> {
city: &'a str,
state: &'a str,
}
fn run() -> Result<(), Box<dyn Error>> {
let mut wtr = csv::Writer::from_writer(io::stdout());
wtr.flush()?;
Ok(())
}
fn main() {
if let Err(err) = run() {
println!("{}", err);
process::exit(1);
}
}
But then my compiler start to give me some warning about Record
not being used... then I realized why a record was needed to write the headers. The Writer
doesn't know in advance which struct
is going to be serialized. Therefore, I guess it's possible to serialize different types of struct
with the same Writer
. This poses the question, which headers should we write in the following case.
use csv::*;
use serde::*;
#[derive(Serialize)]
struct Township {
township_name: &'static str,
}
#[derive(Serialize)]
struct Village {
village_name: &'static str,
}
fn main() {
let township = Township {
township_name: "Northville Township",
};
let village = Village {
village_name: "Milford",
};
let mut writer = WriterBuilder::default().from_path("foo.csv").unwrap();
writer.serialize(village); // (1)
writer.serialize(township); // (2)
writer.flush();
}
The above program will output the following CSV
village_name
Milford
Northville Township
But if I swap the .serialize()
lines (annotated with (1)
and (2)
), then the output CSV has different headers.
township_name
Northville Township
Milford
It seems somehow that the Writer
should either know in advance which structure is going to be serialized (but that would constraint the flexibility above of having different struct
to serialize in the same file), or that the Writer
should be explicitly told about which headers to use.
I'm not even sure which direction to take at this point. Did I miss something obvious that was in front of my eyes?
Ah right, sorry about leading you down a wild goose chase here.
I'm not sure exactly what to do here unfortunately. Tying a Writer
to a single serialized type is not what I want to do, partially because a Writer
isn't just for Serde serializing and partially because it could hinder ergonomics. i.e., You would almost certainly need a type hint somewhere in order to build such a type.
It seems like the simplest solution here is probably to do as you suggested originally: add a new serialize_header
method. The one catch is that it needs to accept a record value in order to push the serialization infrastructure along. It will have to be documented that the record value won't be written, but instead, only the headers will.
(but that would constraint the flexibility above of having different struct to serialize in the same file)
Personally, I don't like having that flexibility, although I could imagine it being useful in some contexts. The flexibility is more of a result of me not wanting to complicate the API more by pinning Writer
to a particular type. (Or even adding another Writer
-like type just for Serde stuff.)
No worry for the wild goose chase, it made me learn a bit about csv
code so I don't feel like I lost my time :smile:
I'll see what I'll do about implementing that serialize_header
function because indeed, the API would be really weird so not yet sure it's worth doing.
@woshilapin Funnily enough, #164 is a good example of where having a Reader
or a Writer
not tied to a specific type is beneficial. It affords that extra bit of flexibility that lets you handle crazily formatted data.
What a nice use case indeed (even if I don't think I understood all the technicalities 😛). Thank you for pointing it out.
For anyone coming here and as long as no solution to this is merged over here, here's a quick write-down of a workaround I found:
https://stackoverflow.com/a/73375434/2352071