[Python] Add FastAPI-Granian tests
FastAPI already has tests for platforms like Uvicorn and Hypercorn, yet there are no tests with Granian, which is already present among tests by itself. Although proposed tests still use ASGI and not RSGI (to potentially draw an even more prominent contrast), I believe it is still a valuable option for comparison, which can produce a significant enough difference in performance to consider.
For example, running the fortune and query type tests on my machine (M1 Max 64 GB) produces the following:
| Test Name | Fortune (Best) | Query (20) |
|---|---|---|
fastapi |
590636 | 66597 |
fastapi-uvicorn |
610127 | 68038 |
fastapi-granian |
463060 | 69995 |
The tests are 100% the same as for other flavors; this PR only adds proper container files and extends the benchmark configuration. Python version of the base container image is also the same, which is 3.11.
/cc @gi0baro
Hi @Aeron - We are no longer allowing more than 10 tests per framework at a time (see https://github.com/TechEmpower/FrameworkBenchmarks/issues/8420). The toolset will will eventually only take the first 10 because there are a number of frameworks that have more than 10 and need to be updated. Please update this to remove 2 tests.
Hey, @NateBrady23,
I don’t think I have the authority to decide which tests should go. The current number of tests for FastAPI, including the proposed ones, is 13. Should I use my best judgment, or are there any guidelines on which to consider most valuable/suitable?
Should I consider the last round results? For example, the nginx-unit and nginx-unit-orjson tests are not there and are marked as broken. I would assume it’s safe to remove those two. Yet it’ll leave us with 11 tests.
I’m not sure how to determine further reduction. The socketify-* variants only have plaintext and JSON tests, but it doesn’t necessarily mean they don’t deserve to be present.
Any tips? 😅
@Aeron I would ping the latest maintainers of the tests to see if they have a preference. You can remove the tests from the benchmark_config but leave the test implementations in case they want to add them back later.
@Aeron I already discussed in several places why I think benchmarking single frameworks across several servers is not valuable considering benchmarks include those servers (see for ref https://github.com/TechEmpower/FrameworkBenchmarks/pull/8055, https://github.com/TechEmpower/FrameworkBenchmarks/issues/8420#issuecomment-1721586004, https://github.com/emmett-framework/granian/discussions/62).
This is especially true considering the composite scores might advertise misleading information (see https://github.com/TechEmpower/FrameworkBenchmarks/issues/8420#issuecomment-1846626816). For instance, if you check Python composite scores for round 22, FastAPI is in the 2nd place, but:
- the plaintext requests reported are the one using socketify with PyPy, which does not represent (IMHO) the real world deployments
- the same goes for the JSON test
And we can make the same considerations for Falcon (3rd position, but again, plaintext and JSON numbers come from the PyPy socketify runs).
So, given the current context, and seeing no proposals to review how "composite scores" are evaluated (maybe we can launch a discussion about it @NateBrady23?) I don't see any valuable information added in increasing frameworks matrixes with different servers.
Nevertheless, if the community has a different opinion on this, I suggest to switch the threading mode to runtime and use 2 threads, as you're leaving performance on the table with this configuration.
@gi0baro Yeah, I did my homework and read it before, and I mostly agree. But for now, it is what it is. It doesn’t look like the community is eager to change the approach.
Weirdly, but it seems like it’s not always the identical ∂R across frameworks. I certainly saw a similar argument in one of the discussions. In an ideal world, it should, yes.
Arguably, it’s more important for “regular” users to see a very explicit comparison of how different frameworks behave on top of various servers, neck-to-neck. Maybe it’s not ideal and precise, flawed a bit, as you pointed out, yet valuable for many folks, even if it smells like marketing a little. It’s a single point of reference, you know. On which anyone interested enough can build a more educated comparison and analysis.
So, I think it’s worth the shot. For now, at least.
At this point, it’s not even a fact I’ll be able to contact all the original maintainers of things I think should go. We’ll see.
I suggest to switch the threading mode to runtime and use 2 threads, as you're leaving performance on the table with this configuration.
Sure, I’ll look into it. Thank you for pointing it out.