polars icon indicating copy to clipboard operation
polars copied to clipboard

Expose `date_format`, `time_format`, and `datetime_format` parameters to `DataFrame.write_csv`

Open matteosantama opened this issue 3 years ago • 3 comments
trafficstars

I notice that CsvWriter has a field of type SerializeOptions. SerializeOptions in turn supports

  • date_format
  • time_format
  • datetime_format

I'd like to expose these parameters on the Python side in DataFrame.write_csv. We could additionally add support for float_format as requested in #4279

@ritchie46 any objections?

matteosantama avatar Aug 10 '22 21:08 matteosantama

@ritchie46 any objections?

Nope, that would be great!

ritchie46 avatar Aug 11 '22 08:08 ritchie46

I'd like to expose these parameters on the Python side in DataFrame.write_csv. We could additionally add support for float_format as requested in #4279

Spooky, I was thinking about doing something very similar, with some small additions:

Default datetime_format (on the Rust side) should really be per-timeunit (all datetimes currently output as ns):

  • ns >> "%S.%9f"

  • us >> "%S.%6f"

  • ms >> "%S.%3f"

    Then the python-side datetime_format param could EITHER be a single string (in which case all datetime cols get that format, regardless of timeunit) OR a {timeunit:format} dict (so you could override each individually if you want).

Also:

  • Option of a custom string for null_value (defaulting to the empty string, as it is now).
  • Option of a custom empty_string (defaulting to two double-quotes, as it is now).

@matteosantama: I can look at doing this after you've made a patch, or would you like to incorporate some of it?

alexander-beedie avatar Aug 11 '22 10:08 alexander-beedie

@alexander-beedie I think that's an awesome idea.

Sounds like there's a few things we want to do, which should each have its own MR

  • [ ] Expose pre-existing datetime_format, date_format, and time_format functionality on the Python side #4364
  • [ ] Enable more sophisticated datetime_format specifications
  • [ ] Create a new float_format parameter
  • [ ] Options for null_value and empty_string output.

I've just opened up an MR for (1), so you can build off that for (2). And then (3) and (4) can come when those are complete.

matteosantama avatar Aug 11 '22 13:08 matteosantama

I've just opened up an MR for (1)

Nice one; I'm off to Istanbul tomorrow for a few days (for the first time), so will dive-in properly once I get back!

FYI: most of the CSV (and other export/write) tests are ideal candidates for parametric testing (see tests_parametric and the introduction on the original PR for inspiration). I'll definitely add some later, but have a look in the interim if you like :)

alexander-beedie avatar Aug 11 '22 15:08 alexander-beedie

@matteosantama: well, got the custom null_value in, and we're now most of the way towards true per-unit datetime formatting... (will have see about refining that further some time).

Glad I could build off your work :)

alexander-beedie avatar Sep 04 '22 19:09 alexander-beedie