polars icon indicating copy to clipboard operation
polars copied to clipboard

Add DataFrame.write_excel

Open rth opened this issue 3 years ago • 6 comments
trafficstars

Problem description

The read_excel file was added in https://github.com/pola-rs/polars/pull/3567 and it would be nice if there was also a DataFrame.write_excel function, maybe using xlsxwriter-rs.

rth avatar Nov 21 '22 09:11 rth

We have this requirement as well. Currently, we convert the Polars dataframe to Pandas and export it from there. Would be great, of course, if we didn't have to take this detour.

dominikpeter avatar Jan 04 '23 14:01 dominikpeter

Another option would be to use the pure rust rust_xlsxwriter library. The rust_xlsxwriter roadmap explains the rationale and the current features.

I wrote the initial xlsxwriter integration to Pandas so I could try a PR with some input from the core devs.

jmcnamara avatar Jan 11 '23 09:01 jmcnamara

Hello,

I'm currently working on this feature using a wrapper around opepyxl. MR should come still this week, as all is working and I'm fixing tests.

I'm also trying to leave the API exatcly the same as write_csv with the same transformations.

Currently the blocking point is that the read_excel previously implemented cannot handle the inputs well.

bvanelli avatar Jan 11 '23 09:01 bvanelli

Made a draft merge request, if any of you can take a look and suggest changes/improvements.

bvanelli avatar Jan 11 '23 22:01 bvanelli

Another option would be to use the pure rust rust_xlsxwriter library. The rust_xlsxwriter roadmap explains the rationale and the current features.

I wrote the initial xlsxwriter integration to Pandas so I could try a PR with some input from the core devs.

@jmcnamara / @ritchie46: I believe I have a one line PR in the python XlsxWriter repository, heh ;) Kudos on making a Rust port - the python version is excellent! I used it to add a surprisingly comprehensive Excel export option to one of our major internal data APIs when I worked back at JPMorgan, and it was very well thought of.

I'd certainly be interested in helping shape our usage, having done it once before - could start with the python API, and once that looks good we could think about how best to adapt it on the Rust side. Having the same core library features available in both languages seems like a win, and I can speak to the quality/utility of XlsxWriter.

alexander-beedie avatar Jan 12 '23 06:01 alexander-beedie

I believe I have a one line PR in the python XlsxWriter repository, heh ;)

Cool. :-)

I'd certainly be interested in helping shape our usage, having done it once before - could start prototyping something with the python API, and once that looks good we can see how best to adapt it on the Rust side.

That sounds like a good approach.

jmcnamara avatar Jan 12 '23 08:01 jmcnamara

Finally started on this (in Python) ...

alexander-beedie avatar Feb 24 '23 17:02 alexander-beedie

@alexander-beedie good news. If there are any enhancements to XlsxWriter (within reason) that would make integrations with Polars easier/better let me know.

jmcnamara avatar Feb 25 '23 17:02 jmcnamara

@jmcnamara: so far it's a breeze, much as I remember ;)

Have integrated dtype and/or per-column formatting, float precision, conditional formatting (all flavours), table styling, total row, autoformat/autofit, and so on... Full xlsxwriter API can be employed if setting-up the Workbook object outside of the polars write_excel call, for more advanced use-cases.

Almost ready for a first cut; need to polish-up what's there and then take care of docs and do some more validation/testing. Sample output from a single call to df.write_excel(...)

polars_excel_integration

(Sparklines can probably wait for a second iteration, though I definitely want to integrate those too).

alexander-beedie avatar Feb 26 '23 19:02 alexander-beedie

Wow. Looks great. I'm looking forward to it. :-)

jmcnamara avatar Feb 26 '23 20:02 jmcnamara

Wow. Looks great. I'm looking forward to it. :-)

Added a few last features today, and polished it all up along with reasonably detailed docstrings and some tests... First iteration is ready to ship: https://github.com/pola-rs/polars/pull/7251 :)

alexander-beedie avatar Feb 28 '23 20:02 alexander-beedie

@alexander-beedie That is great work. Really strong option support from the start.

jmcnamara avatar Mar 01 '23 11:03 jmcnamara

@jmcnamara: I am a conduit for your remarkable library, orchestrating access across the breadth of the xlsxwriter API... ;)

Once it has settled in a version or two, what would you think about adding some Polars-specific help pages to the xlsxwriter site, equivalent to the existing Pandas ones? (I'd be more than happy to write/commit them, assuming that the docs are part of the repository).

alexander-beedie avatar Mar 01 '23 13:03 alexander-beedie

what would you think about adding some Polars-specific help pages to the xlsxwriter site, equivalent to the existing Pandas ones?

Absolutely. I had actually typed a suggestion like that with my previous comment and then thought that might be insensitive because you have already provided some nice examples in your polar docs. :-) From my point of view the Working with Python Pandas and XlsxWriter were necessary because I kept seeing/answering the same types of questions on StackOverflow.

I'll take a stab at creating a "Working with Polars and XlsxWriter" chaper in the next week or two and hook you in. When do you think this feature will be in a public Polars release?

jmcnamara avatar Mar 01 '23 14:03 jmcnamara

I'll take a stab at creating a "Working with Polars and XlsxWriter" chaper in the next week or two and hook you in. When do you think this feature will be in a public Polars release?

Perfect; shouldn't be more than a few days until 0.16.10 drops 👍

alexander-beedie avatar Mar 01 '23 14:03 alexander-beedie

@jmcnamara: it's out now - just had time to squeeze-in sparkline support too ;)

alexander-beedie avatar Mar 04 '23 06:03 alexander-beedie

@alexander-beedie Excellent. I'll start on the docs and hook you in once I have a basic framework (in the next couple of days).

jmcnamara avatar Mar 04 '23 09:03 jmcnamara

I've added initial docs for this at Working with Polars and XlsxWriter in the main documentation. See also https://github.com/jmcnamara/XlsxWriter/issues/961

jmcnamara avatar Mar 06 '23 01:03 jmcnamara

Any performance measurements against xlsxwriter (in pandas) and especially against PyExcelerate? Python really needs a way faster xlsx export library.

leonkosak avatar Mar 22 '23 09:03 leonkosak

Any performance measurements against xlsxwriter (in pandas) and especially against PyExcelerate?

@leonkosak: Feel free to run some and let us know how it goes ;) (If there's anything obviously suboptimal I can try and optimise further).

Update: a quick & dirty timing check vs pandas shows we're roughly 50-60% faster to write the same amount of data (with default settings) while also creating a "real" Excel table object (with autofilter/etc) and adding default column formats, which pandas doesn't seem to do. (I made sure not to write the extra index col from pandas, so as to make it a fair comparison).

from codetiming import Timer
from datetime import date
import polars as pl

# quickly spin-up a 1,000,000 element DataFrame
df = pl.DataFrame({
    "idx": range(250_000),
    "x": 123.456789, 
    "y": date.today(), 
    "z":"testing,1.2.3.4."}
)

# export to Excel from polars
with Timer():
    df.write_excel( "dataframe_pl.xlsx" )
    
# export to Excel from pandas
pf = df.to_pandas()
with Timer():
    pf.to_excel( "dataframe_pd.xlsx", index=False )

Results: (Writing 1,000,000 mixed-type elements)

library time taken
polars 3.43 secs
pandas 5.50 secs

Polars ~60% faster.

alexander-beedie avatar Apr 27 '23 19:04 alexander-beedie

I've been working on a data handling section for the rust_xlsxwriter docs and wrote a Polar dataframe to Excel wrapper function here: https://github.com/jmcnamara/rust_xlsxwriter/issues/39

It doesn't have a fraction of the functionality of write_excel() and doesn't intend to but from a quick and dirtier implementation of your benchmark above it is ~7x faster than the Python backed writer.

jmcnamara avatar May 12 '23 12:05 jmcnamara

I've uploaded a new Rust crate called polars_excel_writer for serializing Polars dataframes into Excel Xlsx files using rust_xlsxwriter as a backend engine.

It provides two interfaces for writing a Polars Rust dataframe to an Excel Xlsx file:

  • ExcelWriter a simple Excel serializer that implements the Polars SerWriter trait to write a dataframe to an Excel Xlsx file. This is similar to the CsvWriter interface.

  • PolarsXlsxWriter a more configurable Excel serializer that resembles the interface options provided by the Polars Python write_excel() dataframe method. There is still work in progress for this interface.

    One useful feature of PolarsXlsxWriter is that you can mix Polars and rust_xlsxwriter code to access Excel features not available in the current interface.

Note, this is for Rust dataframes rather than Python dataframes so folks on this thread may not be as interested. If you are and you try it out you can leave some feedback here.

jmcnamara avatar Aug 20 '23 20:08 jmcnamara