DBD-MariaDB icon indicating copy to clipboard operation
DBD-MariaDB copied to clipboard

Fix InactiveDestroy

Open rovo89 opened this issue 2 years ago • 16 comments

These commits fix the "fast" cleanup when attribute InactiveDestroy is set.

rovo89 avatar Nov 28 '22 15:11 rovo89

I'm still looking for a good way to avoid $sth->finish() being called on destroy when InactiveDestroy is set. This is important when trying to disconnect without fetching all pending results and possible in some cases where a child process is forked and joined while fetching results.

rovo89 avatar Nov 28 '22 15:11 rovo89

@pali Could you have a look please? Thanks!

rovo89 avatar Nov 29 '22 15:11 rovo89

I have looked at it but I'm really not sure if this work correctly. There are lot of pitfalls and edge cases. I think that I had in my mind some solution for this issue but I have never implemented it. Moreover such feature must be covered by tests and currently project lacks big CI testing with different Perl versions, different mysql/mariadb client versions and server versions. Travis is already gone, so it needs to be replaced by for example Github actions. And just now I fixed AppVeyor CI for Windows builds. So I would propose to postpone this change after CI matrix testing is again working to prevent any breakage. I'm not really sure that changing code related to PL_dirty does not cause segfault in some library combination.

pali avatar Nov 29 '22 21:11 pali

I can understand the need for tests. Even for a single database though, it will be quite hard to verify correctness.

For a statement handle destroy, I think it's required to:

  • Bring the DB into a state where it can execute other statements again (e.g. read pending rows and results from the socket)
  • For server side prepare: Tell the database were not going to use the statement anymore (mysql_stmt_close)
  • Free structures allocated via MariaDB API (e.g. mysql_free_result, mysql_stmt_free_result)
  • Free DBI-related structures

For a database handle destroy:

  • Destroy all statements (should be ensured by Perl reference counting)
  • Tell the database we're going to disconnect (mysql_close)
  • Free structures allocated via MariaDB API (but I don't think there are any for the database handle?)
  • Free DBI-related structures

If InactiveDestroy is set, any communication with the database must be skipped. The socket is shared, so if one process implicitly destroys all DBI objects on exit, it must be ensured that it does not:

  • Read anything (!) from the socket (e.g. the pending results), otherwise the parent process will wait for the data infinitely. A side-effect is that the exit will be faster because it doesn't need to process potentially huge results. I had a case where I set InactiveDestroy even on the parent process. Processing millions of rows takes a while, and if the user pressed Ctrl+C it would need to fetch all pending rows from the database before destroying everything. Setting the attribute made the process finish immediately.
  • Tell the database to destroy anything statement- or database-related, because it's still in use by the parent process.
  • Tell the database we're going to disconnect, otherwise the server closes the connection and the parent process gets a "Lost connection" error.

The problem is that I don't think there is a way to check whether communication over the socket has happened (or not), whether there is still anything allocated on DB side or via the MariaDB API (or not, if we still need it) and whether DBI structures are still allocated.

I think we can only test the side-effects, i.e. connect, send a query, fork, wait for the child to exit, and see if we can still fetch the results. That's basically the two examples in my commit messages. I can try to pack that into a test case. A little problem is that with the implicit $sth->finish(), the child process has already read the pending result from the socket, and the parent process will hang polling for that exact same information. Maybe a timeout would do?

These test cases can only verify though that not too much was freed, but not if something was missed to be freed.

rovo89 avatar Nov 30 '22 08:11 rovo89

I'm not really sure that changing code related to PL_dirty does not cause segfault in some library combination.

As far as I understood the comment, the purpose was to avoid accessing $dbh during global destruction as it might have already been destroyed. However the only remaining code doesn't need the database handle, it just frees the memory allocated by the MariaDB API. It wouldn't hurt much to keep the check because the whole process will anyway be destroyed milliseconds later, but I don't think it's necessary.

DBI already takes care to skip the $sth->finish() call when it's not safe/required:

  • PL_dirty
  • database handle inactive
  • statement handle inactive
  • InactiveDestroy set on the statement handle (which effectively turns the active flag off during destroy)

Point of the commit is not to duplicate this work, but only free what might still be allocated.

rovo89 avatar Nov 30 '22 08:11 rovo89

  • Tell the database we're going to disconnect (mysql_close)

This is an issue. You must call mysql_close (or similar function) to release memory allocated for $dbh even when InactiveDestroy is set. Otherwise you introduce memory leak.

And I think this is the reason why InactiveDestroy was not properly implemented yet.

pali avatar Dec 04 '22 11:12 pali

Thank you for your work on this. I ran into this bug with my own code and wrote a test script (thinking I was going to write a new bug report).

My test script was showing two different types of failure, apparently at random:

  • The child closes the connection to the server, resulting in the parent failing, as reported in this PR.
  • The child reports an error instead: panic: DBI active kids (-1) < 0 or > kids (0)

I tested this PR (cherry-picked on top of current master), and the child no longer closes the connection, which is good. I still get the error from the child about half the time, though. I don't know if this is a separate bug or if it is somehow related. It doesn't happen at all with DBD::mysql.

Here is my test script (renamed as .txt to get around github's filtering): forktest.pl.txt

Here is a good example run, using patched DBD::MariaDB::

$ strace -f -o /tmp/strace.ok.log ./forktest.pl -d MariaDB -f -a
connect to: DBI:MariaDB:;host=localhost
child 480267 : initiating normal exit
parent 480266 : child process complete
parent 480266 : exit

strace.ok.log

Here is a failing example run, also using patched DBD::MariaDB:

$ strace -f -o /tmp/strace.bad.log ./forktest.pl -d MariaDB -f -a
connect to: DBI:MariaDB:;host=localhost
child 480286 : initiating normal exit
panic: DBI active kids (-1) < 0 or > kids (0) at /usr/lib/x86_64-linux-gnu/perl5/5.36/DBI.pm line 759.
END failed--call queue aborted.
parent 480285 : child process complete
parent 480285 : exit

strace.bad.log

I'm using perl v5.36.0 (Debian package 5.36.0-6) with DBI version 1.643 (Debian package 1.643-4).

Thanks, Corey

bugfood avatar Dec 28 '22 22:12 bugfood

@rovo89 you can use a hack/trick with invalidating file descriptor before calling mysql_close() when you need connection to stay open but want to free mysql client resources. Current master branch already contains similar hack/trick for Windows builds due to dichotomy between file descriptor and socket handle in Perl API and Windows API. This could be a way how to properly implement InactiveDestroy.

pali avatar Aug 26 '23 21:08 pali

We use this feature in Request Tracker, so we can add some possible additional test cases. We are working on a branch to support running with DBD::MariaDB and have gotten most of our tests passing. One remaining failure is one that relies on InactiveDestroy. We have some code that uses it here:

https://github.com/bestpractical/rt/blob/6b3a2158b5d502a9115a59efb5c550050f9b72f2/lib/RT/Util.pm#L60

And tests here:

https://github.com/bestpractical/rt/blob/5.0/db-type-mariadb/t/api/safe-run-child-util.t

Was InactiveDestroy working in the past in DBD::mysql and something changed in MariaDB?

cbrandtbuffalo avatar Jan 19 '24 20:01 cbrandtbuffalo

@cbrandtbuffalo: The code comments suggest InactiveDestroy never worked properly in DBD::mysql, either. There were resource leaks preventing the connection from closing (regardless of InactiveDestroy). The leaks were fixed in DBD::MariaDB, which means the connection is now always closed.

choroba avatar Feb 12 '24 21:02 choroba

@cbrandtbuffalo: Can you check the WIP change here with your tests? https://github.com/perl5-dbi/DBD-MariaDB/compare/master...choroba:inactive-destroy?expand=1

choroba avatar Feb 12 '24 21:02 choroba

Yes, I can take a look. All of our tests run in github actions, so the environment is available in the RT github repo. To test, it looks like I'll need to create a new build of the base Debian docker image using your PR. I'll comment if I'm able to run a test.

cbrandtbuffalo avatar Feb 13 '24 14:02 cbrandtbuffalo

The RT tests pass for me when running your branch locally on a Mac. For some reason, the tests still fail in github actions using this docker image: https://hub.docker.com/layers/bpssysadmin/rt-base-debian/RT-5.0.3-buster-testmariadb2-20240215/images/sha256-838a7cb36dad1a2883c30dab9ac6e94cf30d354efdabe2214cf122d9fb5241a5?context=repo I'm still working on debugging. It's possible the build isn't correctly grabbing and building the branch.

cbrandtbuffalo avatar Feb 16 '24 20:02 cbrandtbuffalo

We fixed the docker image and can confirm the branch allows our RT tests to pass. You can see the green checkmark here: https://github.com/bestpractical/rt/tree/5.0/db-type-mariadb. As noted before, we also see passing tests running locally on Mac OS X. We also confirmed it passed on Ubuntu.

cbrandtbuffalo avatar Feb 20 '24 17:02 cbrandtbuffalo

@pali Do you have other thoughts on this branch based on the testing noted above? Our use case for this feature is we have a running web server process (RT) which can fork other processes. When the forked process exits, we don't want it to disconnect the active DB connection the main RT process is using, so we use this flag.

cbrandtbuffalo avatar Mar 01 '24 15:03 cbrandtbuffalo

I noticed some activity with a few commits from @Tux and @choroba so I wanted to ping this PR just in case there might be a release in the future. I know people have had limited time, so I'm happy to help if there's anything do.

cbrandtbuffalo avatar Jul 03 '24 19:07 cbrandtbuffalo