pyxlsb
pyxlsb copied to clipboard
Add basic writing of workbooks
This PR adds a basic capability to create and write sheets to XLSB workbooks.
Examples added to the README.md demonstrate writing data from pandas.DataFrame, numpy.ndarray, or list-of-lists to new sheets in a new workbook. Attempts are made to exploit speedups possible when writing DataFrames/NumPy-arrays versus needing to incrementally inspect the contents of list-of-lists style data.
At present, all test XLSB files created by the code in this PR are readable by the current Office365 Excel, Excel 2010, as well as pyxlsb itself. The PR's code has been exercised on Windows 10 and MacOS mostly using Python 3.8. Attempts were made to preserve compatibility with Python 2.7 as that appeared to be an unspoken goal of the pyxlsb project, though that has not been fully tested.
This PR was developed against the "v1.0.x" branch initially because of a mis-reading what was going on in willtrnr/pyxlsb's master branch. In hindsight, it is still a little unclear if this was the right or wrong choice given some of the restructuring that has happened in master versus v1.0.x branches. Perhaps having developed against "v1.0.x" permits a faster path to accepting this PR and a subsequent release? I am happy to work towards bringing this PR's functionality in line with the changes in the master branch if that would be preferred.
I hope that this PR can already be of interest and use or at least will trigger discussion. Other areas for nearer-term improvement on this PR might include:
- Exposing a convenient option to write pandas.DataFrame column names to sheets as a header row.
- Rationalizing the current overly defensive logic to permit adding sheets to existing workbooks alongside existing sheets (this is nearly possible now).
- Investigating the TODO comments in my code regarding data type compatibilities.
- Investigating the TODO comments regarding refactoring suggestions (moot if the master branch should be pursued right away).
- Adding flexibility for supporting very, very large XLSB writing.
@applio thank you for tackling this.
I am happy to work towards bringing this PR's functionality in line with the changes in the master branch if that would be preferred.
I'll take you up on that, please target master
for this PR as it's the eventual 1.1.0
release. I didn't feel there was enough user-facing improvement in master
to cut a release yet, but this here could tip the scales. I intend on only pushing fixes for the 1.0.x
series.
This is on the larger side, so it'll take some time to review. Since you'll have to rework things for the integration in master
, here's a few preliminary things I've noticed:
- I'd like to ask that you set aside the NumPy and Pandas integration for now so that we can figure out a solid design for "vanilla" Python first. We can definitely look into those in a subsequent PR.
- You have a lot of raw byte array constants, try to split them out into more readable per-field write calls (mentioning which field it is in comments would be ideal), you can refer to MS-XLSB to help with the structures.
- Try to abstract the length handling of the records, if you mirror the reading classes you can have a
DataWriter
using a buffer for pure data types andRecordWriter
for the ID, length and record portion. The situation in master is in much better shape for this since records are actual classes with the data. - Similar to the byte arrays, try using ElementTree to construct the XML literals to make things more readable.
Why is it not merged yet? Any issues?
Why is it not merged yet? Any issues?
Yes, the issues in the comment above yours.
Hi, may i ask someone if it is possible to write/append data to an existing sheet or it has to be a new sheet every time?
Closing for inactivity