PhpSpreadsheet
PhpSpreadsheet copied to clipboard
Different results when calling getHighestRowAndColumn and getHighestColumn + getHighestRow
This is:
- [X] a bug report
- [ ] a feature request
Calling getHighestRowAndColumn, getHighestColumn + getHighestRow provides different results on same sheet.
What is the expected behavior?
Probably should return same results?
What is the current behavior?
In my test sheet:
getHighestRowAndColumn returns C33
getHighestColumn returns ALX
getHighestRow returns 33
So we have two ranges: A1:C33 & A1:ALX33
What are the steps to reproduce?
<?php
$reader = IOFactory::createReaderForFile('phpspreadsheet-test.xlsx');
$spreadsheet = $reader->load($file);
$worksheet = $spreadsheet->getActiveSheet();
$worksheet->garbageCollect();
var_dump($worksheet->getHighestRow(), $worksheet->getHighestColumn(), $worksheet->getHighestRowAndColumn());die;
What features do you think are causing the issue
- [X] Reader
- [ ] Writer
- [ ] Styles
- [ ] Data Validations
- [ ] Formula Calulations
- [ ] Charts
- [ ] AutoFilter
- [ ] Form Elements
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
Appears to be random.
Which versions of PhpSpreadsheet and PHP are affected?
1.23.0
This isn't random: the source of the data is different for these methods.
getHighestColumn() and getHighestRow() return cached values that are read from the file when the worksheet is loaded, and are based not only on whether cells contain data, but also on whether styles or data validation are set for cells (even empty cells), or whether there is height/width information for rows/columns, or print areas or other similar settings. They reflect the highest row/column for which any information is defined in the file for that worksheet.
Two alternative methods, getHighestDataColumn() and getHighestDataRow() also exist, and these work directly against the cell collection. They're less efficient (although I've recently made changes that make them a lot faster) because they check the cell collection every time they're called, but they accurately return the highest row/column that have cell entries in the worksheet.
The getHighestRowAndColumn() method is badly named: it actually uses getHighestDataColumn() and getHighestDataRow() to return the highest column/row from the cell collection. This should really be renamed getHighestDataRowAndColumn(), and getHighestRowAndColumn() should be defined as a method that combines getHighestColumn() and getHighestRow(), but that change would be a bc break with the existing v1.x code. You can expect the discrepancy to be resolved in v2.0
It's also worth noting that the getHighestColumn() and getHighestRow() methods support an optional row or column argument. If they are called with that argument, then they call getHighestDataColumn() and getHighestDataRow() internally, so in that specific circumstance they work against the cell collection
Hello
Looks some similar trouble. In your use case if you try getColumnDimensions() do you get a different value also ?
A column can have a dimension registered in the file but contain no data for any of the cells in that column. In this case, it will be registered by the getHighestColumn() result, but not by the getHighestDataColumn() result. If it's the last column, then there will be a discrepancy between the two functions, it's why we have two distinct functions.