openpyxl icon indicating copy to clipboard operation
openpyxl copied to clipboard

openpyxl doesn't handle pd.NA

Open anbjork opened this issue 2 years ago • 0 comments

The preferred (although still "experimental") way of handling missing values in Pandas is pd.NA https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-na https://jorisvandenbossche.github.io/blog/2019/11/30/pandas-consistent-missing-values/

openpyxl does not yet support it. Example minimal code to reproduce error

import pandas as pd
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows

df = pd.DataFrame([[1,2],[pd.NA,4]])

wb = openpyxl.Workbook()
ws = wb.active
for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)
# ValueError: Cannot convert <NA> to Excel

wb.save("pandas_NA_openpyxl.xlsx")

Some additional usage info:

I am using the optimised writing mode https://openpyxl.readthedocs.io/en/latest/optimized.html#write-only-mode to write files. This is very useful! The memory requirement for vanilla Pandas writing is high enough to be a showstopper in the environment my code is running. (Don't ask me why writing to file would require lots of memory.)

I am working with web applications, so writing to stream via tempfile following your advice here https://openpyxl.readthedocs.io/en/latest/tutorial.html#saving-as-a-stream ( Changing the wb.save() function to accept file objects / IO streams in addition to file paths would be neat too, since the unnecessary disk IO for going via tempfiles could be eliminated )

anbjork avatar Oct 26 '22 10:10 anbjork