PhpSpreadsheet Experiment to replace the large array of Excel function definitions with a FlyWeight Pattern collection of Value Objects

This is:

- [ ] a bugfix
- [ ] a new feature
- [X] refactoring

Checklist:

[X] Changes are covered by unit tests
[X] Code style is respected
[X] Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
[ ] CHANGELOG.md contains a short summary of the change
[ ] Documentation is updated as necessary

Why this change is needed?

Experiment to replace the large "unstructured" array of Excel function signature definitions with a structured collection of Value Objects to see what affect it has on memory usage and speed.

This isn't a perfect implementation yet: but it's enough to take a look at the memory usage and speed of the new solution.

Hopefully, this approach will also make Excel function definitions/implementations easier to maintain in future, and potentially allow new benefits such as defining the Excel version where they became available, and also if Microsoft have deprecated them.

Mar 26 '22 19:03 MarkBaker

@oleibman - Thought you might like to take a look at this, replacing the large array of Excel function signatures in Calculation with a collection of Value Objects that's lazy-loaded as functions are referenced. Entries in the collection; and properties in the VOs can be accessed either as array elements or as object properties.

It's still not 100% yet, there's still some work I need to do on it; but it does seem to show both memory and speed improvements on the test suite (despite the one test that loads every single VO in the collection)... at least there's lower memory usage for PHP 7.4 and PHP 8.0; but all versions runs in the CI tests seem faster. How that will translate into real-world end-user performance, I don't know; but it should still be better.

Mar 27 '22 12:03 MarkBaker

@MarkBaker, I will certainly take a look. However, I will be offline for personal reasons for about a week, so will probably not be able to get back to you before late next week.

Mar 29 '22 19:03 oleibman

@MarkBaker, as usual I am learning so much from reviewing this. It will take a while for me to put it all together. For now, I do have one question. You have moved the initialization of the large array phpSpreadsheetFunctions out of Calculation/Calculation and into Calculation/Engine/ExcelFunctions. But, even in the latter, you still have a large array, simpler than the original but still large. Might it be possible to just initialize that to an empty array (or possibly just the false entries if any) and populate it entirely on demand?

Apr 07 '22 14:04 oleibman

Might it be possible to just initialize that to an empty array (or possibly just the false entries if any) and populate it entirely on demand?

Also possible: The new array is a lot smaller than the original array; but it does still use a not insignificant amount of memory.... it took me a while to get this working "as is", but a version of the ExcelFunctions that tests for in_array(), and if not then tests if there is a signature definition using class_exists(), loading it into the array if there is... I had thought about it at the time, but ran out of weekend to experiment, and haven't had the time to revisit it since. The other variant there is that class_exists() is faster when called without loading the class; but as we'd need to load and instantiate it anyway if class_exists() returned a true, I'm not sure that would make any noticeable difference.

I'll have a go at that on a separate branch next week, so that we can compare memory/speed results for both variations.

Given that only a small number of Excel functions are typically used in any given spreadsheet; but that those few functions are often used repeatedly, it did seem a good target for memory optimisation; though I was a bit concerned about the performance cost of the additional "load on demand"... that pattern of real world demand should see both memory and performance benefits, even if our test suite doesn't fully reflect that. Though I was pleased that there didn't seem any real performance overhead with the change, and the memory saving is visible in the unit test runs, even though there are tests there that would load every function definition.

Apr 07 '22 19:04 MarkBaker

Just as a warning: the biggest issue that I had with this was the large number of additional classes slowing PHPStorm's re-indexing, and making it slower overall... I doubt that would be a problem once I switch to my new laptop, with more memory and a faster processor, but it is a maintenance overhead. That might be a result of 500 classes in a single directory; and I did try breaking them into subdirectories; but that created extra run-time overhead, and seemed to have no benefit in PHPStorm at all; so I left them all in the one folder.

Apr 07 '22 19:04 MarkBaker

And if I do start from an initial empty array, I'll also have to play around a bit with the logic that builds the list of implemented functions when we do a release; because that uses the array keys as its base data; but I'm sure I can work out an alternative reading through the Engine/Functions folder to handle that. Slower release isn't real problem if we're improving run-time performance.

Apr 07 '22 19:04 MarkBaker