jackson-dataformat-spreadsheet
jackson-dataformat-spreadsheet copied to clipboard
Extension for Jackson JSON processor that adds support for (de)serializing POJOs as Spreadsheets
= Jackson dataformat: Spreadsheet ifdef::env-github[] :tip-caption: :bulb: :note-caption: :speech_balloon: :important-caption: :blue_book: :caution-caption: :orange_book: :warning-caption: :closed_book: endif::[] :toc: :toclevels: 3 :toc-placement!: :link-oss-sonatype: https://s01.oss.sonatype.org/index.html#view-repositories;snapshots~browsestorage~/io/github/scndry/jackson-dataformat-spreadsheet/maven-metadata.xml
image:https://github.com/scndry/jackson-dataformat-spreadsheet/actions/workflows/build.yml/badge.svg[] image:https://img.shields.io/maven-metadata/v?metadataUrl=https%3A%2F%2Fs01.oss.sonatype.org%2Fservice%2Flocal%2Frepositories%2Fsnapshots%2Fcontent%2Fio%2Fgithub%2Fscndry%2Fjackson-dataformat-spreadsheet%2Fmaven-metadata.xml[]
[quote] Let's get rid of the Spread Sh!t.
This https://github.com/FasterXML/jackson[Jackson^] extension library is a component that memory-efficiently reads and writes the http://officeopenxml.com/anatomyofOOXML-xlsx.php[SpreadsheetML^] formatfootnote:[XLSX -- XSSF (XML SpreadSheet Format)] using https://poi.apache.org/[Apache POI^] and https://www.xml.com/pub/a/2003/09/17/stax.html[StAX^] via Jackson abstraction. -- Implemented as full streaming implementation, which allows full access (streaming, data-binding, and tree-model).
NOTE: The same abstraction is provided for the Horrible Microsoft Excel formatfootnote:[XLS -- HSSF (Horrible SpreadSheet Format)]. However, not based on https://poi.apache.org/components/spreadsheet/how-to.html#event_api[Event API^], so may not be suitable for large .xls files.
toc::[]
== Installation
CAUTION: This project is currently under development. If you are interested, you can always use the latest snapshot artifacts from {link-oss-sonatype}[OSS Sonatype^].
=== Maven
[source,xml]
=== Gradle
[source,groovy]
repositories { maven { url "https://s01.oss.sonatype.org/content/repositories/snapshots/" } }
dependencies { implementation "io.github.scndry:jackson-dataformat-spreadsheet:0.0.1-SNAPSHOT" }
== Busy Developers' Guide to Features
[quote] Want to use HSSF and XSSF read and write spreadsheets in a hurry? This guide is for you. -- Apache POI's https://poi.apache.org/components/spreadsheet/quick-guide.html[Quick Guide^].
=== Reading Spreadsheet
First, let's look at input-entries.xlsx file to read.
.input-entries.xlsx |=== |Column A |Column B
|1 |2 |3 |4 |===
Then let's define the Entry
class annotated @DataGrid
.
[source,java]
@DataGrid public class Entry { private int a; private int b; // Constructors, Getters, Setters and toString }
We use SpreadsheetMapper
to read the file into an Entry
object, so let’s set it up now.
[source,java]
SpreadsheetMapper mapper = new SpreadsheetMapper(); File file = new File("input-entries.xlsx");
Once we have SpreadsheetMapper
configured, simply use readValue
.
[source,java]
Entry value = mapper.readValue(file, Entry.class); // Output: Entry(a=1, b=2)
You can use readValues
to read the entire list.
[source,java]
List<Entry> values = mapper.readValues(file, Entry.class); // Output: [Entry(a=1, b=2), Entry(a=3, b=4)]
By default, it reads the first sheet.
If you need to find and read a specific sheet by name or index, you can use SheetInput
.
[source,java]
String sheetName = "Entries"; SheetInput<File> input = SheetInput.source(file, sheetName); List<Entry> values = mapper.readValues(input, Entry.class); // Output: [Entry(a=1, b=2), Entry(a=3, b=4)]
=== Writing Spreadsheet
We also use SpreadsheetMapper
to write an Entry
object to the file.
Just call writeValue
.
[source,java]
SpreadsheetMapper mapper = new SpreadsheetMapper(); File file = new File("output-entries.xlsx"); Entry value = new Entry(1, 2); mapper.writeValue(file, value);
The output of the above in the file will be:
.output-entries.xlsx [cols="1,1"] |=== |a |b |1 |2 |===
This time, let's write a list.
[source,java]
List<Entry> values = Arrays.asList(new Entry(1, 2), new Entry(3, 4)); mapper.writeValue(file, values);
java.lang.IllegalArgumentException: valueType
MUST be specified to write a value of a Collection or array type
[quote] Oh, Sheet!
Okay.
Let's add a valueType
to the 3rd parameter.
[source,java]
Class<Entry> valueType = Entry.class; mapper.writeValue(file, values, valueType);
.output-entries.xlsx [cols="1,1"] |=== |a |b |1 |2 |3 |4 |===
Now let's annotate on properties with @DataColumn
to write each column's name.
[source,java]
@DataGrid public class Entry { @DataColumn("Column A") private int a; @DataColumn("Column B") private int b; // Constructors, Getters, Setters and toString }
.output-entries.xlsx [cols="1,1"] |=== |Column A |Column B |1 |2 |3 |4 |===
And also you can use SheetOutput
to write the sheet name.
By default, the created sheet is named "Sheet" + _indexOfSheet_
.
[source,java]
String sheetName = "Entries"; SheetOutput<File> output = SheetOutput.target(file, sheetName); mapper.writeValue(output, values, Entry.class);
=== Nested Object (de)Structuring
Spreadsheets are a flat data representation of a 2-dimensional structure, but most of the POJOs we have take complex nested structures. Of course, it provides (de)structuring these complex models, as you can see below:
|=== |ID |NAME |ZIPCODE |ADDRESS LINE 1 |ADDRESS LINE 2 |DESIGNATION |SALARY
|1 |John Doe |12345 |123 Main St. |Anytown, USA |CEO |$300,000.00 |===
As we are Object-Oriented Developers, you probably don't want to manage a lot of column data in a flat model.
Let's first define the well-structured Employee
class.
[source,java]
@DataGrid class Employee { int id; String name; Address address; Employment employment; // ... }
class Address { String zipcode; AddressLines addressLines; // ... }
class AddressLines { String line1; String line2; // ... }
class Employment { Designation designation; long salary; // ... }
enum Designation { CEO, CTO, SM, ARCH, INT }
No additional code is required when deserializing data into nested structures, you can deserialize the same way.
[source,java]
Employee value = mapper.readValue(input, Employee.class);
Here’s the Employee
object that output will be:
Employee( id=1 name=John doe address=Address( zipcode=12345 addressLines=AddressLines( line1=123 Main St. line2=Anytown, USA ) ) employment=Emeployment( designation=CEO salary=300000 ) )
Just as can be deserialized without any additional code, you can serialize these nested objects.
[source,java]
mapper.writeValue(output, values, Employee.class);
==== Writing the Nested List
Although limited, it supports serializing nested list structures. However, deserializing is not supported due to implementation complexity.
[source,java]
@DataGrid class Outer { @DataColumn("A") int a; List<Inner> inners; @DataColumn("E") int e; // ... }
class Inner { @DataColumn("B") int b; @DataColumn("C") int[] c; @DataColumn("D") int d; // ... }
[source,java]
Outer value = new Outer( 1, Arrays.asList( new Inner(2, new int[]{3, 4}, 5), new Inner(6, new int[]{7, 8}, 9)), 10); mapper.writeValue(output, value);
This will lead to the desired result like this:
[cols="1a,1a,1a,1a,1a"] |=== |A |B |C |D |E
|1 |2 |3 |5 |10
|BLANK
|BLANK
|4 |BLANK
|BLANK
|BLANK
|6 |7 |9 |BLANK
|BLANK
|BLANK
|8 |BLANK
|BLANK
|===
=== Annotations and Schema
For most column schema generation requirements, @DataGrid
and @DataColumn
annotations will provide all you need.
The two have almost similar attributes, so only listing attributes of @DataColumn
.
.Attributes in @DataColumn
-
String value()
: The column's name to write in header -
String style()
: Thename
of thecellStyle
to apply to the column, it should be registered withStylesBuilder
. -
String headerStyle()
: Same asstyle()
, but only applied to the column's header. -
int width()
: Fixed width for the column. -
OptBoolean autoSize()
: Adjusts the column width to fit the contents. Not accurate for https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms[fullwidth forms^] like the https://en.wikipedia.org/wiki/CJK_characters[CJK characters^]. -
int minWidth()
: Minimum width of the auto-sized column. -
int maxWidth()
: Maximum width of the auto-sized column.
==== Attribute Resolution
Column schema uses particular order designed to allow for value overrides. Attributes are considered in the following order:
. @DataColumn
's attribute on property.
. @DataGrid
's attribute on the class in which the property is declared.
. @DataGrid
's attribute of the class of which the class in which the property is declared is a member.
For a better understanding, let's look at the code below:
[source,java]
@DataGrid(autoSizeColumn = OptBoolean.FALSE) class Foo { @DataColumn(style = "Foo_a_style") int a; @DataColumn(headerStyle = "Foo_b_headerStyle") int b; // ... }
@DataGrid(columnHeaderStyle = "Bar_columnHeaderStyle") class Bar { Foo foo; // ... }
@DataGrid(columnStyle = "Baz_columnStyle", autoSizeColumn = OptBoolean.TRUE) class Baz { Foo foo; // ... }
Schema is generated and used when (de)serializing within SpreadsheetMapper
, but you can generate using sheetSchemaFor
.
[source,java]
SpreadsheetSchema foo = mapper.sheetSchemaFor(Foo.class); SpreadsheetSchema bar = mapper.sheetSchemaFor(Bar.class); SpreadsheetSchema baz = mapper.sheetSchemaFor(Baz.class);
Column attributes of schemas generated for each of the above types are as follows:
[cols="1,1a,1a,1a"] |=== |Column |Style |Header Style| Auto Size
|foo.a
|"Foo_a_style"
|""
|FALSE
|foo.b
|""
|"Foo_b_headerStyle"
|FALSE
|bar.foo.a
|"Foo_a_style"
|"Bar_columnHeaderStyle"
|FALSE
|bar.foo.b
|""
|"Foo_b_headerStyle"
|FALSE
|baz.foo.a
|"Foo_a_style"
|""
|FALSE
|baz.foo.b
|"Baz_columnStyle"
|"Foo_b_headerStyle"
|FALSE
|===
=== Build Your Own Styles
[source,java]
@DataGrid(columnHeaderStyle = "baseHeader") class Employee { // ... Address address; Employment employment; // ... }
class Address { @DataColumn(value = "ZIPCODE", style = "zipcode") String zipcode; // ... }
@DataGrid(columnHeaderStyle = "employmentHeader") class Employment { // ... @DataColumn(value = "SALARY", style = "salary", headerStyle = "salaryHeader") long salary; // ... }
[source,java]
StylesBuilder builder = new StylesBuilder() .cellStyle(/name/ "baseHeader") .border().thin() .fillForegroundColor(IndexedColors.GREY_25_PERCENT) .fillPattern().solidForeground() .font().bold().end() .end() .cellStyle(/name/ "employmentHeader", /cloneStyleFrom/ "baseHeader") .fillForegroundColor(198, 239, 206) .end() .cellStyle("salaryHeader", "baseHeader") .fillForegroundColor(0xFFC7CE) .font().bold().color(0x9C0006).end() .end() .cellStyle("salary") .dataFormat("[Red][>=100000]$#,##0.00;$#,##0.00") .end() .cellStyle("zipcode") .dataFormat().text() .end(); SpreadsheetMapper mapper = SpreadsheetMapper.builder() .stylesBuilder(builder) .build(); mapper.writeValue(output, values, type);
.Built-in data formats for convenience
-
general()
:"General"
-
text()
:"@"
-
numberInt()
:"0"
-
numberFloat()
:"0.00"
-
numberIntWithComma()
:"#,##0"
-
numberFloatWithComma()
:"#,##0.00"
-
date()
:"yyyy-mm-dd"
-
dateTime()
:"yyyy-mm-dd hh:mm:ss"
[source,java]
SpreadsheetMapper mapper = SpreadsheetMapper.builder() .addModule(new ExcelDateModule()) .build();
=== Change Origin Address
[source,java]
CellAddress address = ...; SpreadsheetMapper mapper = SpreadsheetMapper.builder() .origin(address) .origin("B2") .origin(1, 1) .build(); mapper.writeValue(output, values, type);
[cols="1a,1a,1a"]
|===
|BLANK
|BLANK
|BLANK
|BLANK
|Column A
|Column B
|BLANK
|1
|2
|BLANK
|3
|4
|===
=== Overwrite Column Names
[source,java]
SpreadsheetMapper mapper = SpreadsheetMapper.builder() .columnNameResolver(prop -> "Overwrite " + prop.getName().toUpperCase()) .build(); mapper.writeValue(output, values, type);
[cols="1,1"] |=== |Overwrite A |Overwrite B |1 |2 |3 |4 |===
[source,java]
enum ColumnCode { A("Code A"), B("Code B"); String text; // ... }
@Retention(RetentionPolicy.RUNTIME) @interface NameOf { ColumnCode value(); }
@DataGrid class Entity { @NameOf(ColumnCode.A) @DataColumn("It will be overwritten") int a; @NameOf(ColumnCode.B) int b; // ... }
[source,java]
ColumnNameResolver byText = AnnotatedNameResolver.forValue(NameOf.class, ColumnCode::getText); SpreadsheetMapper mapper = SpreadsheetMapper.builder() .columnNameResolver(byText) .build(); mapper.writeValue(output, values, type);
[cols="1,1"] |=== |Code A |Code B |1 |2 |3 |4 |===
=== Reading Under Control
[source,java]
SpreadsheetReader reader = mapper.sheetReaderFor(Entry.class); try (SheetMappingIterator<Entry> iterator = reader.readValues(input)) { while (iterator.hasNext()) { Entry value = iterator.next(); SheetLocation location = iterator.getCurrentLocation(); boolean done = ...; if (done) { break; } } }
[source,java]
SpreadsheetMapper mapper = SpreadsheetMapper.builder(); .enable(SheetParser.Feature.BREAK_ON_BLANK_ROW) .build();
.On/Off Features in SheetParser.Feature
-
BLANK_ROW_AS_NULL
(defaulttrue
) -
BREAK_ON_BLANK_ROW
(defaultfalse
)
== And more.
=== Supported Types
- Deserialization
**
java.io.File
**java.io.InputStream
**io.github.scndry.jackson.dataformat.spreadsheet.deser.SheetInput<T>
**org.apache.poi.ss.usermodel.Sheet
- Serialization
**
java.io.File
**java.io.OutputStream
**io.github.scndry.jackson.dataformat.spreadsheet.ser.SheetOutput<T>
**org.apache.poi.ss.usermodel.Sheet
=== References
- https://github.com/FasterXML/jackson-docs[Jackson documentation hub^]
- https://poi.apache.org/components/spreadsheet/index.html[POI-HSSF and POI-XSSF/SXSSF^]
== License
Project is licensed under https://www.apache.org/licenses/LICENSE-2.0.txt[Apache License 2.0^].