PhpSpreadsheet icon indicating copy to clipboard operation
PhpSpreadsheet copied to clipboard

Different results when calling getHighestRowAndColumn and getHighestColumn + getHighestRow

Open fliespl opened this issue 3 years ago • 4 comments

This is:

- [X] a bug report
- [ ] a feature request

Calling getHighestRowAndColumn, getHighestColumn + getHighestRow provides different results on same sheet.

What is the expected behavior?

Probably should return same results?

What is the current behavior?

In my test sheet: getHighestRowAndColumn returns C33 getHighestColumn returns ALX getHighestRow returns 33

So we have two ranges: A1:C33 & A1:ALX33

What are the steps to reproduce?

phpspreadsheet-test.xlsx

<?php

        $reader = IOFactory::createReaderForFile('phpspreadsheet-test.xlsx');
        $spreadsheet = $reader->load($file);
        $worksheet = $spreadsheet->getActiveSheet();
        $worksheet->garbageCollect();
        var_dump($worksheet->getHighestRow(), $worksheet->getHighestColumn(), $worksheet->getHighestRowAndColumn());die;

What features do you think are causing the issue

  • [X] Reader
  • [ ] Writer
  • [ ] Styles
  • [ ] Data Validations
  • [ ] Formula Calulations
  • [ ] Charts
  • [ ] AutoFilter
  • [ ] Form Elements

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Appears to be random.

Which versions of PhpSpreadsheet and PHP are affected?

1.23.0

fliespl avatar Jul 05 '22 06:07 fliespl

This isn't random: the source of the data is different for these methods.

getHighestColumn() and getHighestRow() return cached values that are read from the file when the worksheet is loaded, and are based not only on whether cells contain data, but also on whether styles or data validation are set for cells (even empty cells), or whether there is height/width information for rows/columns, or print areas or other similar settings. They reflect the highest row/column for which any information is defined in the file for that worksheet.

Two alternative methods, getHighestDataColumn() and getHighestDataRow() also exist, and these work directly against the cell collection. They're less efficient (although I've recently made changes that make them a lot faster) because they check the cell collection every time they're called, but they accurately return the highest row/column that have cell entries in the worksheet.

The getHighestRowAndColumn() method is badly named: it actually uses getHighestDataColumn() and getHighestDataRow() to return the highest column/row from the cell collection. This should really be renamed getHighestDataRowAndColumn(), and getHighestRowAndColumn() should be defined as a method that combines getHighestColumn() and getHighestRow(), but that change would be a bc break with the existing v1.x code. You can expect the discrepancy to be resolved in v2.0

MarkBaker avatar Jul 05 '22 09:07 MarkBaker

It's also worth noting that the getHighestColumn() and getHighestRow() methods support an optional row or column argument. If they are called with that argument, then they call getHighestDataColumn() and getHighestDataRow() internally, so in that specific circumstance they work against the cell collection

MarkBaker avatar Jul 05 '22 10:07 MarkBaker

Hello

Looks some similar trouble. In your use case if you try getColumnDimensions() do you get a different value also ?

camlafit avatar Jan 14 '24 17:01 camlafit

A column can have a dimension registered in the file but contain no data for any of the cells in that column. In this case, it will be registered by the getHighestColumn() result, but not by the getHighestDataColumn() result. If it's the last column, then there will be a discrepancy between the two functions, it's why we have two distinct functions.

MarkBaker avatar Jan 15 '24 09:01 MarkBaker