npoi icon indicating copy to clipboard operation
npoi copied to clipboard

XSSFColumn class implementation

Open artem-iron opened this issue 4 months ago • 1 comments

XSSFColumn class

This PR adds an IColumn interface (similar to an IRow interface) and implements it in a XSSFColumn object. XSSFSheet is refactored to use that new object for all operations on columns in a very similar way to the way operations on rows are done: copying, cuting, pasting, shifting (with attention to formulas), formatting (both styles and width, hidennes, un-/grouping), adding new columns, removing cells by column - pretty much all things that are possible for XSSFRows are now possible with XSSFColumn with a more fluent and easy to use API, than the ColumnHelper had provided.

Major hurdle for this was the way columns are stored in the sheet.xml - CT_Col objects are not individual columns, but are "spans" of columns. If columns from 5 to 10 have the same style, width, hidden status and outline level - they will be represented as a CT_Col object with min field of 5 and max field of 10. So its one "span" of columns from 5 to 10, rather than 6 columns. This makes it hard to work with individual columns. For this reason this PR also changes the way CT_Col objects are parsed and stored in XSSFSheet - the "spans" are now broken down to individual CT_Col objects with min and max fields set to the same value, so that each CT_Col object represents a single column. This happens when a workbook is read from a file. When a workbook is written to a file, the CT_Col objects are again merged into "spans" of columns, depending on their style, width, hidden status and outline level, to match the way Excel stores columns in the sheet.xml.

This still had drawbacks - in case if all columns on the sheet had the same formatting all the way to the end of the sheet the last CT_Col object in the sheet.xml would have max field set to the maximum number of columns in Excel (16384). This would make the sheet.xml file very large and slow to parse. This happens even if only one column is actually used in the sheet. A compromise was made - if the last CT_Col object has max field set to the maximum number of columns in Excel - the max field is set to be a maximum of:

  • min field + 1 of the last CT_Col object that has max field set to the maximum number of columns in Excel
  • column index of the cell with a maximum column index in the sheet

This is a compromise, because if that was done on a workbook where it was intended to have all columns formatted the same way - this information about the columns beyond this calculated max field will be lost. The good news is usually this is not intended and the max field of the last CT_Col object is set to the maximum number of columns in Excel is just a result of a hastily applied formatting to the whole sheet.

The PR comes with a lot of tests, that test the new XSSFColumn object and the refactored XSSFSheet object.

artem-iron avatar Feb 09 '24 03:02 artem-iron