polars
polars copied to clipboard
Add DataFrame.write_excel
Problem description
The read_excel file was added in https://github.com/pola-rs/polars/pull/3567 and it would be nice if there was also a DataFrame.write_excel function, maybe using xlsxwriter-rs.
We have this requirement as well. Currently, we convert the Polars dataframe to Pandas and export it from there. Would be great, of course, if we didn't have to take this detour.
Another option would be to use the pure rust rust_xlsxwriter library. The rust_xlsxwriter roadmap explains the rationale and the current features.
I wrote the initial xlsxwriter integration to Pandas so I could try a PR with some input from the core devs.
Hello,
I'm currently working on this feature using a wrapper around opepyxl. MR should come still this week, as all is working and I'm fixing tests.
I'm also trying to leave the API exatcly the same as write_csv with the same transformations.
Currently the blocking point is that the read_excel previously implemented cannot handle the inputs well.
Made a draft merge request, if any of you can take a look and suggest changes/improvements.
Another option would be to use the pure rust rust_xlsxwriter library. The rust_xlsxwriter roadmap explains the rationale and the current features.
I wrote the initial
xlsxwriterintegration to Pandas so I could try a PR with some input from the core devs.
@jmcnamara / @ritchie46: I believe I have a one line PR in the python XlsxWriter repository, heh ;) Kudos on making a Rust port - the python version is excellent! I used it to add a surprisingly comprehensive Excel export option to one of our major internal data APIs when I worked back at JPMorgan, and it was very well thought of.
I'd certainly be interested in helping shape our usage, having done it once before - could start with the python API, and once that looks good we could think about how best to adapt it on the Rust side. Having the same core library features available in both languages seems like a win, and I can speak to the quality/utility of XlsxWriter.
I believe I have a one line PR in the python XlsxWriter repository, heh ;)
Cool. :-)
I'd certainly be interested in helping shape our usage, having done it once before - could start prototyping something with the python API, and once that looks good we can see how best to adapt it on the Rust side.
That sounds like a good approach.
Finally started on this (in Python) ...
@alexander-beedie good news. If there are any enhancements to XlsxWriter (within reason) that would make integrations with Polars easier/better let me know.
@jmcnamara: so far it's a breeze, much as I remember ;)
Have integrated dtype and/or per-column formatting, float precision, conditional formatting (all flavours), table styling, total row, autoformat/autofit, and so on... Full xlsxwriter API can be employed if setting-up the Workbook object outside of the polars write_excel call, for more advanced use-cases.
Almost ready for a first cut; need to polish-up what's there and then take care of docs and do some more validation/testing. Sample output from a single call to df.write_excel(...)
(Sparklines can probably wait for a second iteration, though I definitely want to integrate those too).
Wow. Looks great. I'm looking forward to it. :-)
Wow. Looks great. I'm looking forward to it. :-)
Added a few last features today, and polished it all up along with reasonably detailed docstrings and some tests... First iteration is ready to ship: https://github.com/pola-rs/polars/pull/7251 :)
@alexander-beedie That is great work. Really strong option support from the start.
@jmcnamara: I am a conduit for your remarkable library, orchestrating access across the breadth of the xlsxwriter API... ;)
Once it has settled in a version or two, what would you think about adding some Polars-specific help pages to the xlsxwriter site, equivalent to the existing Pandas ones? (I'd be more than happy to write/commit them, assuming that the docs are part of the repository).
what would you think about adding some Polars-specific help pages to the xlsxwriter site, equivalent to the existing Pandas ones?
Absolutely. I had actually typed a suggestion like that with my previous comment and then thought that might be insensitive because you have already provided some nice examples in your polar docs. :-) From my point of view the Working with Python Pandas and XlsxWriter were necessary because I kept seeing/answering the same types of questions on StackOverflow.
I'll take a stab at creating a "Working with Polars and XlsxWriter" chaper in the next week or two and hook you in. When do you think this feature will be in a public Polars release?
I'll take a stab at creating a "Working with Polars and XlsxWriter" chaper in the next week or two and hook you in. When do you think this feature will be in a public Polars release?
Perfect; shouldn't be more than a few days until 0.16.10 drops 👍
@alexander-beedie Excellent. I'll start on the docs and hook you in once I have a basic framework (in the next couple of days).
I've added initial docs for this at Working with Polars and XlsxWriter in the main documentation. See also https://github.com/jmcnamara/XlsxWriter/issues/961
Any performance measurements against xlsxwriter (in pandas) and especially against PyExcelerate? Python really needs a way faster xlsx export library.
Any performance measurements against xlsxwriter (in pandas) and especially against PyExcelerate?
@leonkosak: Feel free to run some and let us know how it goes ;) (If there's anything obviously suboptimal I can try and optimise further).
Update: a quick & dirty timing check vs pandas shows we're roughly 50-60% faster to write the same amount of data (with default settings) while also creating a "real" Excel table object (with autofilter/etc) and adding default column formats, which pandas doesn't seem to do. (I made sure not to write the extra index col from pandas, so as to make it a fair comparison).
from codetiming import Timer
from datetime import date
import polars as pl
# quickly spin-up a 1,000,000 element DataFrame
df = pl.DataFrame({
"idx": range(250_000),
"x": 123.456789,
"y": date.today(),
"z":"testing,1.2.3.4."}
)
# export to Excel from polars
with Timer():
df.write_excel( "dataframe_pl.xlsx" )
# export to Excel from pandas
pf = df.to_pandas()
with Timer():
pf.to_excel( "dataframe_pd.xlsx", index=False )
Results: (Writing 1,000,000 mixed-type elements)
| library | time taken |
|---|---|
| polars | 3.43 secs |
| pandas | 5.50 secs |
Polars ~60% faster.
I've been working on a data handling section for the rust_xlsxwriter docs and wrote a Polar dataframe to Excel wrapper function here: https://github.com/jmcnamara/rust_xlsxwriter/issues/39
It doesn't have a fraction of the functionality of write_excel() and doesn't intend to but from a quick and dirtier implementation of your benchmark above it is ~7x faster than the Python backed writer.
I've uploaded a new Rust crate called polars_excel_writer for serializing Polars dataframes into Excel Xlsx files using rust_xlsxwriter as a backend engine.
It provides two interfaces for writing a Polars Rust dataframe to an Excel Xlsx file:
-
ExcelWritera simple Excel serializer that implements the PolarsSerWritertrait to write a dataframe to an Excel Xlsx file. This is similar to theCsvWriterinterface. -
PolarsXlsxWritera more configurable Excel serializer that resembles the interface options provided by the Polars Pythonwrite_excel()dataframe method. There is still work in progress for this interface.One useful feature of
PolarsXlsxWriteris that you can mix Polars andrust_xlsxwritercode to access Excel features not available in the current interface.
Note, this is for Rust dataframes rather than Python dataframes so folks on this thread may not be as interested. If you are and you try it out you can leave some feedback here.