PhpSpreadsheet icon indicating copy to clipboard operation
PhpSpreadsheet copied to clipboard

`Callculation::resizeMatricesExtend()` overwrites all rows in the smaller matrix.

Open mklepaczewski opened this issue 6 months ago • 3 comments
trafficstars

The bug

Calling Callculation::resizeMatricesExtend() with matrices of different sizes overwrites all rows in the smaller matrix with the value found in the last row. I haven't found the expected behaviour of the method (comments of the method don't specify it), but I think the idea was to extend the smaller matrix with values found in the last row, not to overwrite all of it with the value found in the last row.

Example:

<?php
declare(strict_types=1);

require __DIR__ . '/vendor/autoload.php';

use PhpOffice\PhpSpreadsheet\Calculation\Calculation;

// Sample matrices to test with
$matrix1 = [[1], [3]];
$matrix2 = [[5], [8], [11]];

// Use reflection to make the protected method accessible
$calculation = new Calculation();
$reflectionMethod = new ReflectionMethod(Calculation::class, 'resizeMatricesExtend');
$reflectionMethod->setAccessible(true);

// Call the method using reflection
$reflectionMethod->invokeArgs($calculation, [&$matrix1, &$matrix2, count($matrix1), 1, count($matrix2), 1]);

echo "Matrix 1 after:\n";
print_r($matrix1);

(Presumably) Expected output:

$matrix1 = [ [1], [3], [3] ];

Actual output

$matrix1 = [ [3], [3], [3] ];

Relevant code: https://github.com/PHPOffice/PhpSpreadsheet/blob/fb2cfed070f16091e842a2b027d2d15c5a39e2f9/src/PhpSpreadsheet/Calculation/Calculation.php#L871-L876

Note that the analogous code for columns does not overwrite the previous columns, it extends them with the last value: https://github.com/PHPOffice/PhpSpreadsheet/blob/fb2cfed070f16091e842a2b027d2d15c5a39e2f9/src/PhpSpreadsheet/Calculation/Calculation.php#L863-L870

Proposed change

            if ($matrix1Rows < $matrix2Rows) {
                $x = $matrix1[$matrix1Rows - 1];
                for ($i = matrix1Rows ; $i < $matrix2Rows; ++$i) {
                    $matrix1[$i] = $x;
                }
            }

Please note that analogous change should be done for case where $matrix2Rows < $matrix1Rows.

I can create a PR if the change is accepted.

Why it matters

We found that some formulas that work for us in Excel result in #CALC! in PhpSpreadsheet. As an example, the following type of formulas result in #CALC!: =FILTER(Products!M:M;(Products!K:K=D5)*(Products!L:L=E5)). Products!K:K=D5 and Products!L:L=E5 generate matrices of different sizes (for whatever reason). PhpSpreadsheet extends the smaller matrix and overwrites all of its values. In effect, FILTER() fails to match any rows and results in #CALC!

Expected result

I don't know if the above behaviour is expected, but I suspect it's a bug. I haven't found any description of this behaviour. If it's not a bug, documenting it might be a good idea.

mklepaczewski avatar Apr 24 '25 12:04 mklepaczewski

Yes, please go ahead and create a PR with your changes and new test cases, and I will evaluate it. It would be preferable for your test cases to run in the context of a spreadsheet (e.g. your #CALC! formulas) rather than, or at least in addition to, your Reflection example. Feel free to add as many comments as you think the method requires.

oleibman avatar Apr 24 '25 17:04 oleibman

I have looked into this some more, and will probably create a PR on my own. There's a lot going on here.

Extending the last row or column doesn't really make sense to me. I think this was probably intended to expand a 1*1 matrix to the size of is corresponding operand. I could easily be wrong, but let's go with that for now. In that case, what we want to do is repeat the last row (if there is one row) or column (if there is one column), or otherwise null-fill. This seems to not break anything, and get a reasonable result for your test case.

I admit that I'm not entirely sure what a reasonable result for your test case is. If you use the entire column, PhpSpreadsheet will wind up using 1,048,576-element arrays, which, given enough time and memory (which you probably don't have), will probably work. So I reformulated your formula as:

=FILTER(Products!M1:M9;(Products!K1:K8=D5)*(Products!L1:L9=E5))

Note that the K array has 1 fewer entry than M or L. This matches your symptom (CALC error) with the old logic, but the new logic handles it "correctly". At least it probably meets your expectations. But ... using that formula in Excel results in an error (I've seen VALUE and N/A and I think CALC but I can't remember how to duplicate that)! So perhaps the original result was correct, for the wrong reason. I need time to think about how this should really work.

oleibman avatar May 13 '25 02:05 oleibman

I'm sorry for not creating the PR yet. Life happened ;-)

I admit that I'm not entirely sure what a reasonable result for your test case is.

I don't know if compatibility with Excel is a goal of this project, but as a user, that would be my expectation. If I remember correctly, Excel handles it by repeating the last row/column. What I'm sure of is that overwriting the smaller matrix is not the expected result.

Please don't worry about my case. We'll manage it on our end.

Thank you for the hard work you put into the project!

mklepaczewski avatar May 13 '25 05:05 mklepaczewski