otp icon indicating copy to clipboard operation
otp copied to clipboard

Optimise accesses to the Table name -> Table id map

Open RobinMorisset opened this issue 2 years ago • 23 comments

Currently these accesses are guarded by (striped) read-write locks.

With this patch, there are still (regular) locks for protecting writing threads from each other, but threads that only need to read that map (so the overwhelming majority) no longer need to use any lock. Instead we now rely on lightweight atomic synchronisation, and on the thread_progress functionality for memory reclaimation.

One ugly hack is that if we add 1 table, then remove it, the corresponding bucket may go

  • inline, empty
  • inline, with a value
  • outlined vector without a value

Once we have a filled the inline vector, we can only transition to an outlined vector, and never go back to the inline state. This saves us from an ABA hazard:

  • Reader sees inline vector of size 1
  • Reader reads the name NameA that corresponds to table IdA, sees it matches what it looked for
  • Writer deletes (or rename the table), moving us to an empty vector
  • Writer inserts another table in that bucket, with a different name: NameB -> IdB
  • Reader reads IdB, concluding that NameA -> IdB. As long as we stick to outline vectors and always allocate a fresh one when deleting tables, we don't have that risk, as the vectors only get freed (and thus the adress may get reused by another vector) once all readers have made thread progress (and thus are no longer in the middle of an operation).

One consequence of this hack, is that we must lie to the memory counting system, pretending to have an outlined vector at all times, even when we only have the initial inlined vector. Otherwise, the tests detect the memory increase in inline (1 element) -> outlined (0 element) -> outlined (1 element), and report a memory leak.

This patch was motivated by noticing that heavyweight services can spend several % of CPU time in db_get_table_aux. A trial in production confirmed that this patch roughly halves the time spent in that function.

RobinMorisset avatar Apr 13 '23 14:04 RobinMorisset

CT Test Results

  2 files   15 suites   3m 37s ⏱️ 115 tests 110 ✅ 5 💤 0 ❌ 135 runs  130 ✅ 5 💤 0 ❌

Results for commit a6e675da.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

github-actions[bot] avatar Apr 13 '23 14:04 github-actions[bot]

ping @sverker

RobinMorisset avatar Sep 26 '23 09:09 RobinMorisset

ping @sverker

I'm sorry. I will try to get time to look at this again. I think it demands a proper review. Lock-less stuff are tricky.

sverker avatar Oct 04 '23 18:10 sverker

ping @RobinMorisset

Would love to see this merged :)

lpgauth avatar Jan 19 '24 18:01 lpgauth

oops, sorry for forgetting about that. I'm a bit busy this week, but I'll try to get this ready for merging next week.

RobinMorisset avatar Jan 23 '24 10:01 RobinMorisset

@RobinMorisset would be nice to have in OTP 27 :)

lpgauth avatar Feb 08 '24 13:02 lpgauth

This new version passes tests, so it should not be completely broken, but I've put the double-checking before the locking because the lock is not always taken (see my comment), so I'm not very confident yet that this last change is correct.

RobinMorisset avatar Feb 20 '24 10:02 RobinMorisset

I've checked and the new test catches that bug (triggers the segfault, which disappears with the fix). Thanks again for the careful review!

RobinMorisset avatar Feb 27 '24 10:02 RobinMorisset

Did you get the test case to trigger the race as it is, without erts_thr_yield and with only 1000 iterations?

sverker avatar Feb 27 '24 10:02 sverker

I got it to trigger the race as it is, but I had 1M iterations at the time. I reduced the number of iterations because it takes forever to run otherwise now that the race is fixed.

RobinMorisset avatar Feb 27 '24 11:02 RobinMorisset

Testing some more, 1k iterations does not trigger the bug, but 10k iterations is enough to trigger it reliably (2 out of 2 runs). But once I fix the bug, 10k iterations takes forever (to the point I'm wondering whether it is triggering some kind of deadlock or livelock or something).

RobinMorisset avatar Feb 27 '24 11:02 RobinMorisset

I found the issue: I was missing a db_unlock in the double-check branch! The test now runs fine (and very fast) with the fix, and immediately blows up if I re-introduce the segfault in that branch, confirming that it is being tested.

RobinMorisset avatar Feb 27 '24 11:02 RobinMorisset

You can also use undocumented spawn_opt option {scheduler,N} to make sure writer and reader are spawned on different schedulers.

sverker avatar Feb 27 '24 11:02 sverker

You should always also run debug built emulator when developing. That will detect the missing unlock.

sverker avatar Feb 27 '24 11:02 sverker

You should always also run debug built emulator when developing. That will detect the missing unlock.

How do you ensure that the debug version is used by the tests? I've just been following https://github.com/erlang/otp/blob/master/HOWTO/TESTING.md and doing

make stdlib_test ARGS="-suite ets_SUITE -case racy_rename"

RobinMorisset avatar Feb 27 '24 11:02 RobinMorisset

I don't know if there is a way to select emulator type when running tests like that.

I use the old way by releasing the tests: https://github.com/erlang/otp/blob/master/HOWTO/TESTING.md#releasing-tests

Then I can start whatever erl I want and run tests with ts:run.

$ERL_TOP/bin/cerl -debug
1> ts:install().
2> ts:run(stdlib,ets_SUITE,racy_rename,[batch]).

sverker avatar Feb 27 '24 12:02 sverker

I've just tried that, and ts:install() fails with ** exception error: undefined function ts:install/0. Is there an additional command I should run to bring ts in scope?

RobinMorisset avatar Feb 27 '24 14:02 RobinMorisset

You need to stand in the test_server directory.

The more modern variant is using common test: (though you will still need test_server in the path).

cd  TEST_INSTALL_PATH/test/app/
erl -pa /TEST_INSTALL_PATH/test/test_server -sname foo EXTRA_ARGS
> ct:run_test([{suite, ets_SUITE}, {testcase,racy_rename}]).

dgud avatar Feb 27 '24 14:02 dgud

What is TEST_INSTALL_PATH supposed to be? I tried doing $ERL_TOP/bin/cerl -debug -pa ../../tests/test_server in $ERL_TOP/release/tests/stdlib_test and then calling ct:run_test([{suite, ets_SUITE}, {testcase,racy_rename}]). in the resulting shell, but it gave me an error message "Failed to start CTH, see the CT Log for details" (with no information on how to find that log) and "*** FAILED {ets_SUITE,init_per_suite} ***". I also tried ts:install(), and it apparently found it, but it gave another error message: {error,no_configure_script}

Sorry for all the questions, I generally find the test and build system rather mystifying and I'd like to figure it out for future PRs.

RobinMorisset avatar Feb 27 '24 14:02 RobinMorisset

-pa ../../tests/test_server This needs to be an absolute path for a to me unknown reason.

dgud avatar Feb 27 '24 15:02 dgud

It worked! Thanks for the help. Unfortunately, the race that reproduces every few thousand iterations in release mode stubbornly refuses to reproduce in debug mode, even with 1M iterations. Still it is good to know how to run tests in debug mode.

RobinMorisset avatar Feb 27 '24 16:02 RobinMorisset

It worked! Thanks for the help. Unfortunately, the race that reproduces every few thousand iterations in release mode stubbornly refuses to reproduce in debug mode, even with 1M iterations. Still, it is good to know how to run tests in debug mode.

That is not surprising. The debug VM does a lot of extra checks that can radically change the timing.

sverker avatar Feb 27 '24 17:02 sverker

make TYPE=debug
make stdlib_test TYPE=debug ARGS="-suite ets_SUITE -case racy_rename"

as per https://github.com/erlang/otp/blob/master/HOWTO/DEVELOPMENT.md#types-and-flavors

garazdawi avatar Feb 27 '24 18:02 garazdawi

@RobinMorisset bumping this is case you have time to complete this PR :) thanks

lpgauth avatar Jul 12 '24 14:07 lpgauth

@sverker I just rebased to fix the trivial conflict in the test suite. I've also re-measured the performance impact of this PR in production, this time on more and larger machines, and it is still reliably a 1.5% win. Is there any blocker left to merging this?

RobinMorisset avatar Sep 23 '25 16:09 RobinMorisset

@RobinMorisset Sorry for letting this linger. I need to schedule time to do a proper review and assure myself that all the necessary memory barriers are in place without race conditions. What kind of benchmark was the 1.5% win? An ETS heavy "real" application or a synthetic micro benchmark pounding on name lookup and maybe table renames/create/deletes?

Also, I think John did something similar during a profiling and optimization session. Should maybe look at what he did, if it still exists.

sverker avatar Sep 30 '25 13:09 sverker

It was a very "real" application: our largest service in production. I measured the 1.5% win in two ways: comparing 4 machines with the change to 4 machines without at very high throughput (~90% cpu usage, with the 4 machines with the change having between 1 and 2% less cpu usage overall), and looking at profiles (db_get_table falls from 3.3% of total cpu usage to 1.8%)

RobinMorisset avatar Sep 30 '25 16:09 RobinMorisset