npoi
npoi copied to clipboard
XSSFColumn class implementation
XSSFColumn class
This PR adds an IColumn
interface (similar to an IRow
interface) and implements it in a XSSFColumn
object. XSSFSheet
is refactored to use that new object for all operations on columns in a very similar way to the way operations on rows are done: copying, cuting, pasting, shifting (with attention to formulas), formatting (both styles and width, hidennes, un-/grouping), adding new columns, removing cells by column - pretty much all things that are possible for XSSFRow
s are now possible with XSSFColumn
with a more fluent and easy to use API, than the ColumnHelper
had provided.
Major hurdle for this was the way columns are stored in the sheet.xml
- CT_Col
objects are not individual columns, but are "spans" of columns. If columns from 5 to 10 have the same style, width, hidden status and outline level - they will be represented as a CT_Col
object with min
field of 5 and max
field of 10. So its one "span" of columns from 5 to 10, rather than 6 columns. This makes it hard to work with individual columns. For this reason this PR also changes the way CT_Col
objects are parsed and stored in XSSFSheet
- the "spans" are now broken down to individual CT_Col
objects with min
and max
fields set to the same value, so that each CT_Col
object represents a single column. This happens when a workbook is read from a file. When a workbook is written to a file, the CT_Col
objects are again merged into "spans" of columns, depending on their style, width, hidden status and outline level, to match the way Excel stores columns in the sheet.xml
.
This still had drawbacks - in case if all columns on the sheet had the same formatting all the way to the end of the sheet the last CT_Col
object in the sheet.xml
would have max
field set to the maximum number of columns in Excel (16384). This would make the sheet.xml
file very large and slow to parse. This happens even if only one column is actually used in the sheet. A compromise was made - if the last CT_Col
object has max
field set to the maximum number of columns in Excel - the max
field is set to be a maximum of:
-
min
field + 1 of the lastCT_Col
object that hasmax
field set to the maximum number of columns in Excel - column index of the cell with a maximum column index in the sheet
This is a compromise, because if that was done on a workbook where it was intended to have all columns formatted the same way - this information about the columns beyond this calculated max
field will be lost. The good news is usually this is not intended and the max
field of the last CT_Col
object is set to the maximum number of columns in Excel is just a result of a hastily applied formatting to the whole sheet.
The PR comes with a lot of tests, that test the new XSSFColumn
object and the refactored XSSFSheet
object.