tempest-framework Discovery cache improvements

Let's look into whether we can improve discovery cache for local development

Partial discovery cache: only discover files that have been changed or added according to local git history
Always cache vendor discovery: vendor files can only be changed on composer update/install. Let's add a post-install script that will re-store discovery cache, but let's not re-scan vendor files when not needed
Have a discovery cache manager so that paths don't have to be hard-coded in each discovery class

Sep 16 '24 06:09 brendt

@brendt - I have some thoughts here...

While less developer friendly, have you tried parsing files with nikic/php-parser at all? I'd imagine it'd be faster, but unsure. We'd also avoid some runtime issues we have today (e.g., if you run discovery across all files in the vendor dir, you might load a Symfony class relying on a dependency not installed, etc.).

I'm wondering if we could build a few helpers over top with some basic AST checks (e.g., Attributes, whether a class extends another, whether a class implements another, methods, parameters, etc.).

We might be able to also use this method to pull caching out of the hands of individual discoverers and only handle it at a discovery manager level.

Sep 19 '24 16:09 aidan-casey

While less developer friendly

😱

have you tried parsing files with nikic/php-parser at all? I'd imagine it'd be faster, but unsure

Why? Compared to reflection? Noooo definitely not, reflection is super fast

if you run discovery across all files in the vendor dir, you might load a Symfony class relying on a dependency not installed,

We shouldn't though? We're only scanning Tempest packages. On top of that, we're already catching runtime exceptions and are simply not loading those classes

I'm wondering if we could build a few helpers over top with some basic AST checks (e.g., Attributes, whether a class extends another, whether a class implements another, methods, parameters, etc.).

We have this with the reflector helpers — I find they really useful, you don't?

We might be able to also use this method to pull caching out of the hands of individual discoverers and only handle it at a discovery manager level.

That is the goal, but I don't see why we'd need to switch reflection for it?

Sep 20 '24 07:09 brendt

We shouldn't though? We're only scanning Tempest packages. On top of that, we're already catching runtime exceptions and are simply not loading those classes

But what exactly is a tempest package? Can I make m own tempest package which also gets autodiscovered?

Sep 24 '24 20:09 Wulfheart

@Wulfheart every package that has a dependency on a tempest core package is considered a tempest package. Right now you'd need to require eg. tempest/core and your package will be discovered. However, I plan on adding some kind of "package tools package" that all third party packages should require.

Sep 25 '24 04:09 brendt

Store discovery cache as plain PHP instead of manually serializing it

I played around with several cache pool adapters, the fastest one is actually serializing the payload and storing it as a file. Eg. PhpFileCacheAdapter is slower. There might be room for much more improvement, but I'm going to skip this refactor for now. I still plan on looking into partial caching though.

Nov 16 '24 13:11 brendt

Do you have benchmarks of Tempest with and without the cache? I'm curious as to how Discovery impacts performance

Nov 16 '24 13:11 innocenzi

Taking note of some benchmarks, this is time ./tempest

branch	cache enabled	cache disabled	partial cache enabled
framework (main)	0.12s	0.16s	/
framework (partial)	0.11s ✅	0.21s ❗️	0.20s ❗️
project (main)	0.11s	0.15s	/
project (partial)	0.11s ✅	0.15s ✅	0.10s ✅
big project (main)	0.42s	0.74s	/
big project (partial)	0.36s ✅	0.71s ✅	0.58s ✅

Note that the framework code will benefit less or not from these changes since there are no vendor files to discover in the monorepo.

The main performance bottleneck comes from the following: if we want to allow partial discovery, we need to introduce extra checks to determine whether a specific discovery location should be crawled for a specific discovery class. Eg: if partial cache is enabled, all discovery locations in vendor/ should be skipped, but ONLY if there is a valid cache entry for it. If that's not the case, we have to warm the cache the first time. This per-location check adds overhead, since we need now need to check cache status for every location/discovery combination.

BTW, this is the branch I'm working on: https://github.com/tempestphp/tempest-framework/tree/partial-discovery-cache

Update: if we manually generate the discovery cache with discovery:generate command (run it on deploy and after composer install), we won't have to worry about doing extra checks. The upside is that partial discovery does indeed have performance gain in that scenario. The downside it that people will need to remember to run discovery:generate. We can automate it as much as possible, and we can add error messages to let people know they forgot to run it in some cases.

Nov 21 '24 19:11 brendt

Benchmarks for running our testsuite

branch	test suite
main	00:16
partial-discovery-cache	00:15 ✅

Nov 24 '24 14:11 brendt

TODO after merge:

[x] Document partial discovery cache
[x] Composer hook to automatically generate discovery cache
[x] Testing, but it should be done outside our normal test suite (since clearing caches affects the whole testsuite). I think the best approach is to move discovery-related cache tests to the tempest/app testsuite

Nov 25 '24 07:11 brendt

Taking note of some benchmarks, this is time ./tempest

branch cache enabled cache disabled partial cache enabled framework (main) 0.12s 0.16s / framework (partial) 0.11s ✅ 0.21s ❗️ 0.20s ❗️ project (main) 0.11s 0.15s / project (partial) 0.11s ✅ 0.15s ✅ 0.10s ✅ big project (main) 0.42s 0.74s / big project (partial) 0.36s ✅ 0.71s ✅ 0.58s ✅

To make some sense of those numbers I would like to know approximately how many files and classes are in the big project and what is the machine specs that run time ./tempest.

Looking at those numbers, I am a bit worried about how it performed. Taking around a minute to warm the cache is pretty significant.

Nov 26 '24 18:11 guilhermeaiolfi

Taking around a minute to warm the cache is pretty significant

It didn't. 1 minute = 60 seconds. The longest number here is 0.74s which is 740 milliseconds or 0.0123 minutes.

I still think we can get that down considerably.

Nov 27 '24 01:11 aidan-casey

are in the big project and what is the machine specs that run time ./tempest.

These benchmarks shouldn't be used to compare absolute numbers, but I can give some more info:

The big project benchmark counted 2000 generated vendor files and 4000 generated source files. We use https://github.com/tempestphp/tempest-benchmark/blob/main/setup/setup.php to generate such projects (although, this version doesn't include the vendor files, that was a local change that I still have to push)
It was run on a Mac Mini M1
As Aidan pointed out: half a second second, not 1 minute :)

Nov 27 '24 06:11 brendt

Oh, sorry. My bad. I guess that's not to warm the cache then and it is just to pass thought the checks after the cache is already there.

But that's a latency in every request made. Just to put it out here, the welcome page for symphony in a dev env is 42ms in my machine (less powerful than a Mac mini m1).

Nov 27 '24 23:11 guilhermeaiolfi

I guess that's not to warm the cache then and it is just to pass thought the checks after the cache is already there.

I'd suggest taking another look at the table above. That is with cache disabled.

Nov 28 '24 02:11 aidan-casey

I guess that's not to warm the cache then and it is just to pass thought the checks after the cache is already there.

I'd suggest taking another look at the table above. That is with cache disabled.

I guess something else is going on, because by that table it would be 220ms of overhead just for checking the partial caching, isn't it?

Nov 28 '24 03:11 guilhermeaiolfi