jackson-future-ideas
jackson-future-ideas copied to clipboard
Is there any feasability of new module jackson-dataformat-excel
Currently there are no libraries for xlsx to POJO mapping. jackson-dataformat-csv exists but sometime we need xlsx / xls. I would like to contribute on this module if possible.
Always happy to get more format modules. But what kind of format is xlsx? Excel-native binary format of some kind?
https://en.wikipedia.org/wiki/Office_Open_XML
it's a zipped XML format
We're using https://github.com/sett4/jackson-dataformat-xlsx-lite and are quite satisfied with the implementation.
Thank you for the link, that is hopefully useful.
As to specific data format module: if it's just zipped xml, it is sort of at an interesting level above dataformat, possibly a datatype module... unzipping does not usually belong to Jackson dataformat level (in my opinion), and XML itself could (and perhaps should) be handled via xml format module. But then again there may be some kinks in details of xml application (~= schema) used, which needs to be translated into more usable form.
The link that @cproof provides is an interesting start, though it appears to currently only support writing and not reading. An interesting approach is to wrap dataformat-csv and convert the xlsx into a csv format prior to any reading (or the reverse direction for writing). A limitation of this approach is that would disconnect xslx features, like formulas, from the implementation and essentially allow you to treat a Sheet in an excel workbook as if it were a csv file. This might be a good first approximation of what most people might use this for, but may not support all cases.
I used apache poi library to read / write excel. So instead of rewriting from scratch, we can utilise this lib. We can just apply fasterxml eco system wrapper over this.
Please note this lib not only read / write excel file (different formats), it applies different styles into excel sheet as well. https://poi.apache.org/
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.9</version>
</dependency>
Interesting thought... yes, there is the binary format(s) too.
And a big part of value, I think, is to expose contents as structured stream of tokens which can be bound to typed objects. Currently this is best supported by CSV format module, but actual logic of taking table/2d style content, exposing as a stream of Object(-like) (or, String[], depending on configuration) is quite simple after decoding.
Thinking out aloud beyond Excel, I am sure there are other application-specific formats/conventions that would be interesting to expose via Jackson streaming API/abstraction, too, and question is whether there would be need or benefits from thinking of maybe a mid-level -- is there for example some metadata that needs to be surfaces.
Or maybe this would simple be something like jackson-dataformats-app for "Application formats", which would contain implementations, but work as regular Jackson dataformat backends.
But as to POI, and Excel in general: trying to deal with content logic (like style application) would become tricky quite soon, I think.