dataframe
dataframe copied to clipboard
Excel add new sheet without overwriting the file
Hello, I've made a couple changes in order to be able to add sheets to an already existing file (without completely replacing it).
Overview
By using a keepFile parameter the user can now decide to add a new sheet to an Excel file, this parameter is optional and defaults to false to keep the current behavior by default.
Implementation
keepFile
- Added the new parameter as an optional setting allowing the user to add a sheet to an existing file (if set as true), leaving it blank or setting as false will keep current behavior, completely overwriting the file.
Factory
I had to update the logic by which the workbook was determined for a couple reasons:
- Current setting would always create a new file, so a new flow had to be added
- Keeping the factory function was generating 0b files when trying to reuse existing file, so it is now returning a WorkBook (keeping the wb type logic as is for xls and xlsx files).
Please let me know if something here needs to be updated as this was the less easy part of the change
Test
- Added test to test the feasibility of creating/overwriting an Excel file with data for a sheet1, and then the same data is written on a sheet2.
- Test checks if both sheet contents are the same
.gitignore
- Added new entry so the Excel file generated by the new test is not kept in version control, as it would have changes everytime the tests were rerun.
Final comments
I hope this helps, and please let me know if there is something I should update or that could have been done in a better way. Have a good weekend!
Hi! Thank you for the PR! What do you think about creating Workbook
in the user code?
Like
val wb = ...
df1.writeExcel(wb, "Sheet1")
df2.writeExcel(wb, "Sheet2")
I like your idea better, because this way people won't have to deal with factories, streams. But still interesting to understand alternatives
Hi, not a bad a idea, having null create and if a wb is provided it would add. I could even add that having writeExcel return the wb would help in this situation as I see most people would use this to create a new file and then add sheets to it, so this would be an easier way to get the wb.
However, this would require people to create the wb on their own and import the dependencies on their own (and apply the xls/xlsx logic that determines HSSF/XSSF). I feel this latest bit justifies leaving this to the function instead. Specially in Notebooks where the project can be imported with
%use dataframe
I'm not entirely sure as I haven't touched notebooks in a while but I believe in this situation the dependency to the workbook would also have to be added with the import statement which is not as straightforward.
Considering the pros and cons I would have this handled by the function writeExcel itself, although it can also be overloaded to allow both behaviors. (I can also submit this bit if needed)
I agree, and writing several sheets is a common enough task to be solved by writeExcel
itself. So keepFile
is good. Could you also add a paragraph with an example here https://github.com/Kotlin/dataframe/blob/master/docs/StardustDocs/topics/write.md?
Create a new function in
org.jetbrains.kotlinx.dataframe.samples.api.Write and follow the instruction https://github.com/Kotlin/dataframe/blob/master/docs/contributions.md.
@LeandroC89 Could you update the pr to the newest master? Then we might be able to merge it for the 0.9.0 release
Sorry it took so long. At the time there was an issue with Korro, and I haven't checked this one in a while after the import was removed! It touches a few more files now due to having taken all the changes since originally PR'ed.
Run formatKotlin
task please, then i can merge it