Enhancement of the verification procedure
The objective is to identify possible improvements of the new verification procedure:
Too strict checking for db queries?
Sometimes the test fails for 1 or 2 missing db queries (out of 20480 on Update test), while the other statistics are within the acceptance margin. This is the case for the following frameworks:
- Laravel-swoole Update test
- php-eloquent Update test
- ninja-standalone Update test
- spiral Update test
- aspcore-mono-* Update test
- sinatra-unicorn-mri Update test
As the statistics on the number of db queries seemed reliable, we had not accepted a margin on this number.
I'd be interested to know why there are 1 or 2 queries missing... Perhaps we can add a margin of acceptance also on queries for these cases?
Failures due to the internal processing of the Framework
The case has already been identified for java/hibernate, which caches queries already made (due to the random aspect of the test), which distorted the results for the Query and Update tests.
The solution was to randomly select unique numbers.
It is possible that other frameworks, java or not, may be in this case, I think it's probably the same for the following frameworks:
- About
Hibernate: AFAIR most of the Java projects that are usingHibernatewere updated to use theJPAinterfaces. If the project is on Hibernate version 5.4.1 this should be true.HibernateandEclipseLinkare implementations of theJPAinterface/standard. I see thatJava/actwithEclipseLinkdoesn't have a problem. I haven't inspected the code to check what's going on there.
So it should be possible and relatively easy the projects to be migrated to EclipseLink* as JPA provider. I don't see this option to be mentioned in the previous discussions.
*I haven't done such migration.
Also Hibernate has the StatelessSession but it is not possible to use it with the JPA interfaces. The difference from the normal Session is exactly the absence of Level 1 object cache. It's an option but I'll not advise for it.
- About the usage of 512 concurrency:
Updated: (see here)
~~Currently MySQL has a
weirddefault configuration activated. If there are 200 or more pending writing transactions it will throw aTransactionDeadlockException. You will stumble upon on such examples. So far I've tried to lower the maximum connections from the connection pool but these exception are still there (thegrizzly-jerseyandundertow-jersey). How the code manages to create 200 pending transactions from 192 connections is another discussion. OnlyJava/act-eclipselink-mysqlandGo/fasthttpframeworks are not throwing such an error. The results are from the run before the stricter validation.~~ Added Another case is this: false positive in case the framework is configured in a particular way. https://github.com/TechEmpower/FrameworkBenchmarks/pull/5225#issuecomment-553337876
~~Most probably this default behaviour should be disabled.~~
I finally reconsider my position about failed tests for 1 or 2 missing db queries. I think it would be wrong to add a margin of acceptance at this time that would mask this issue.
For having modified the code of some frameworks so that they can pass the tests again, for the moment, it appears that the failures, even marginal, are due to real causes. They therefore help to improve the code of benchmarks.
The case of Symfony is for the moment one of the most revealing, it was simply missing 1 or 2 db queries on the Update test. The flush of the 20 updates in a single transaction was certainly the cause (to be confirmed in the next runs).
What do you think, @nbrady-techempower ?
@zloster What do you think about setting innodb_table_locks = 0 with innodb-lock-wait-timeout? That would resolve the 200 or more pending transactions issue. The question is, what would be the difference in the test when an actual deadlock occurs.
@jcheron I'm inclined to agree with you here. Let's keep it as is for now. It's helping discover real issues and that's a huge win.
@nbrady-techempower My understanding is that innodb_deadlock_detect should be OFF. Also innodb_lock_wait_timeout should have the same value as PostgreSQL. For the current very short burst for benchmarking IMO the range should be 15-30 seconds. This way we should get fairly early an exception.
We figured out in Symfony how to fix the deadLock issue (for the record, deadlock occurs when a process 1 updates A then B and process 2 updates B then A). With 512 currencies of 20 queries, there are a non negligible probability to update the same rows in a different order.
To solve that, symfony orders updates by id (#5237)
Regarding the initial issue Too strict checking for db queries?
There is an issue with some (too optimized) ORM. For instance by default PHP Doctrine ORM don't execute update query if nothing is changed. With 512 concurent request and 20 updates (total = 10 240 query), when changing the value with an 5 digit random, there is a non-negligible chance to pick the same number and don't trigger the update (65% chance to have 1 update that won't change).
In my opinion the Rules are respected, but the verify is too strict. If the value did not changed, a SQL query don't have to be executed. I suggest to allow 1/10000 request or allowing 7 digits random value in order to reduces the collision probability to 0.2%
Doctrine is certainly optimized, which caused the 1-cache problem, but the existing code for Doctrine-Symfony Update test did not properly manage concurrent access to the database (deadlocks), which is now solved, in part due to this missing margin of error.
The difficulty with a more flexible check is that the case of deadlocks generating just a few missing queries would go unnoticed. And perhaps other phenomena, not yet identified, may never be so.
@nbrady-techempower @jderusse I stand corrected. I've updated the post.
Thanks everyone for putting so much effort into this!
@nbrady-techempower @jcheron I've get an error when I run vagrant@TFB-all:~$ tfb --mode verify --test dropwizard-postgres --concurrency-levels 128.
Verifying test fortune for dropwizard-postgres caused an exception: can't multiply sequence by non-int of type 'float'
FAIL for http://tfb-server:9090
Caused Exception in TFB
This almost certainly means your return value is incorrect,
but also that you have found a bug. Please submit an issue
including this message: can't multiply sequence by non-int of type 'float'
Traceback (most recent call last):
File "/FrameworkBenchmarks/toolset/benchmark/framework_test.py", line 113, in verify_type
results = test.verify(base_url)
File "/FrameworkBenchmarks/toolset/benchmark/test_types/fortune_type.py", line 47, in verify
problems += verify_queries_count(self, "fortune", url, concurrency, repetitions, expected_queries, expected_rows)
File "/FrameworkBenchmarks/toolset/benchmark/test_types/verifications.py", line 433, in verify_queries_count
problems.append(display_queries_count_result(queries * queries_margin, expected_queries, queries, "executed queries", url))
File "/FrameworkBenchmarks/toolset/benchmark/test_types/verifications.py", line 451, in display_queries_count_result
if result > expected_result * 1.05:
TypeError: can't multiply sequence by non-int of type 'float'
The whole console log is in this Github gist.
@jcheron @nbrady-techempower Is it possible to add some form of delay before the verification queries are executed. For example some command-line option with default '0 seconds'. I'm having a hard time finding out why the dropwizard/PostgreSQL variants are not able to pass the verifications.
What about just starting with —mode debug so you can send your own requests?
With -mode debug and manual execution of the extracted-from-the-code reporting query (sort of) I don't get discrepancies.
@zloster I’m on mobile right now so I can’t verify. Can you try putting —concurrency-levels [128] (as a list)? If that’s the issue I’ll update the toolset tomorrow to fix that.
@nbrady-techempower I think we have misunderstanding - the change about concurrency-levels is OK and working.
My request is about adding some sort of configurable delay before execution of the verification SQL queries that gather the statistics for the number of executed DB operations.
Oh, sorry! I was also reading the post before and didn’t realize it was from November. But, sure, I can take a look at that tomorrow for you.