DBD-Oracle
DBD-Oracle copied to clipboard
At times Segfault during deconstruction after upgrade from 1.76 to 1.80
Via the changelog we found the maybe relevant change: https://metacpan.org/diff/file?target=MJEVANS/DBD-Oracle-1.80/&source=ZARQUON%2FDBD-Oracle-1.76#dbdimp.c
We use it in a rather complex internal tool and the segfault sometimes happens at the very end. Still we are able to consistently reproduce it.
How could we support you in finding the root cause?
can you provide a script that causes the segfaut?
Unfortunately not really. The script is huge (~70'000 lines) and does a lot. And it does only fail on certain data, but the difference is not easy to figure out as it does a lot of queries and it only segfaults on the teardown of the driver. Is there any easy option to log just the SQL interaction to figure out differences between working and non-working?
You can enable tracing? https://metacpan.org/pod/DBI#trace
I created 2 tracesfiles, one from a run of the script that segfaults and one from a run where it doesn't. The used trace flag was 'DBD'. traces.zip
We‘re having the same issue with each test case that uses DBIx::Class for example via Test::WWW::Mechanize::Catalyst.
Our workaround is to call $schema->storage->disconnect before done_testing.
I created 2 tracesfiles, one from a run of the script that segfaults and one from a run where it doesn't. The used trace flag was 'DBD'. traces.zip
The trace files are quite big. Each contains round about 2 million lines. Can you repeat your test with export DBI_TRACE=5 perl your_test_script.pl and share the part with the DESTROY lines (as shown in https://github.com/perl5-dbi/DBD-Oracle/issues/65#issuecomment-453022681) plus the last 100 lines before the DESTROY lines?
Does anybody have a stack trace or a core dump?
We‘re having the same issue with each test case that uses DBIx::Class for example via Test::WWW::Mechanize::Catalyst. Our workaround is to call
$schema->storage->disconnectbeforedone_testing.
Yes, doing an explicit disconnect e.g. with $dbh->disconnect prevents the segfault. That's an experience with #65.
Made new traces with tracelevel 5. traces.zip
/bin/sh: line 1: 90272 Segmentation fault (core dumped)
I have this random issue too but I'm not sure if it was the upgrade from oracle-instantclient12.2 to oracle-instantclient19.6 or the upgrade from perl-DBD-Oracle-1.74-12.2.0.1.0 to perl-DBD-Oracle-1.80-19.6.0.0.0.
I'll try to down grade perl-DBD-Oracle to see if we still get the random Seq Faults but it is weird.
@mrdvt92 it would almost certainly be 1.80 of dbd::oracle
It happens for me as well in 1.791.
In my case I'm able to recreate in situations where there are mutliple connections, at least one of them lives outside the main script and no disconnect is called.
Ex.
connect.pl
use DBI;
use DBD::Oracle;
$dbh = DBI->connect("dbi:Oracle:$DATABASE", $USER, $PASSWORD);
require("connect.inc");
#$dbh->disconnect;
connect.inc
$dbh2 = DBI->connect("dbi:Oracle:$DATABASE", $USER, $PASSWORD);
Uncommenting $dbh->disconnect does fix the Seg Fault in this example. Setting local scope for $dbh2 also fixes it.
Perl 5.30.0 (with threads) DBD::Oracle 1.791 InstantClient 12.2.0.1.0
I am also observing this issue for a module with multiple oracle connections. using installs from Backpan I was able to zero in on a change between versions 1.75_2 (has no segmentation fault) and 1.77_1 (has segmentation fault). [there were no versions available in between]
I also see the segmentation fault clear out if there is an explicit disconnect for 1.77_1 and beyond.
Perl 5.30.2 (no threads) [also observed for 5.30.1 with threads] InstantClient 12_2
We just ran into this issue by upgrading to the 19c client. Here is what I sent to the dbi-users list,
More info, this error does not occur with DBD::Oracle 1.76.
DBD::Oracle 1.80 => works with 18c client, but fails with 19c. DBD::Oracle 1.76 => works with all client versions.
On 6/19/20 5:48 PM, Scott wrote:
We have run into an issue when we upgraded to Oracle client 19c. Some of the users processes are segfaulting on exit.
#0 0x00007f82ee84ccc0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x00007f82e6444f43 in kputxabt () from /u01/app/oracle/product/19.3.0.0/lib/libclntsh.so.19.1 #2 0x00007f82e926e6c3 in ora_db_rollback () from /usr/local/perl-5.22.0-thr/lib/site_perl/5.22.0/x86_64-linux-thread-multi/auto/DBD/Oracle/Oracle.so #3 0x00007f82e9266b11 in XS_DBD__Oracle__db_DESTROY () from /usr/local/perl-5.22.0-thr/lib/site_perl/5.22.0/x86_64-linux-thread-multi/auto/DBD/Oracle/Oracle.so #4 0x00007f82ed10291d in XS_DBI_dispatch () from /usr/local/perl-5.22.0-thr/lib/site_perl/5.22.0/x86_64-linux-thread-multi/auto/DBI/DBI.so
I tested the same process on a server still using the 18c client and the core dump does not happen.
I assuming this is the change causing the segfault with 19c client.
Destroy envhp with last dbh (GH#93, GH#89, Dean Hamstead, CarstenGrohmann)
This appears to have something to do with global destruction. The following code segfaults:
use DBI; use DBD::Oracle;
{ $dbh = DBI->connect("dbi:Oracle:XEPDB1", 'db', 'password'); $dbh2 = DBI->connect("dbi:Oracle:XEPDB1", 'db', 'password'); print "dbh = $dbh\n"; print "dbh2 = $dbh2\n"; } whereas the following code does not:
use DBI; use DBD::Oracle;
{ my $dbh = DBI->connect("dbi:Oracle:XEPDB1", 'db', 'password'); my $dbh2 = DBI->connect("dbi:Oracle:XEPDB1", 'db', 'password'); print "dbh = $dbh\n"; print "dbh2 = $dbh2\n"; }
So there must be some object that's being destroyed in the wrong order when global destruction happens. (Tested on Perl 5.16.3, CentOS 7.8, DBD::Oracle 1.80, Oracle 18c)
I added some debugging code. The one that does not segfault (with the my variables) prints this:
In destructor: Calling dbd_db_disconnect SessionEnd: 0x18c0a30 0x1890608 0x18906f0 0x188f918 In destructor: Back from dbd_db_disconnect In destructor: Calling dbd_db_disconnect SessionEnd: 0x176cba0 0x1879fb8 0x187a0a0 0x1882f40 In destructor: Back from dbd_db_disconnect In dbd_dr_destroy
The one that does segfault prints this:
In destructor: Calling dbd_db_disconnect SessionEnd: 0xa27c10 0x9f77e8 0x9f78d0 0x9f6af8 In destructor: Back from dbd_db_disconnect In dbd_dr_destroy In destructor: Calling dbd_db_disconnect SessionEnd: 0x8d3d10 0x9e1198 0x9e1280 0x9ea120 Segmentation fault
Notice how in the one that segfaults, dbd_dr_destroy is called before the second $dbh destructor is called. The global destructor is destroying objects in the wrong order.
The attached patch fixes the problem for me. I would not say I'm particularly happy with this patch; I see it more as a workaround than a proper fix, but I'm attaching it for anyone who wants to try it out.
The attached patch fixes the problem for me. I would not say I'm particularly happy with this patch; I see it more as a workaround than a proper fix, but I'm attaching it for anyone who wants to try it out.
Unfortunately the patch did not work for my case. I still got the same seg fault.
It would be nice to have a proper fix for this because as it is now my $work is locked in at v1.76.
What perl and oracle versions did people try with this patch?
On 2020-08-07 17:47, Wesley Hinds wrote:
Unfortunately the patch did not work for my case
Are you sure you ran it against the patched version? I copied exactly your case, and this is what I get:
First, I run it against the original DBD::Oracle version 1.80 and I get the segmentation fault:
make -C DBD-Oracle-1.80-ORIG install
perl connect.pl
Segmentation fault
Next, I run it against the patched version and there's no segfualt:
make -C DBD-Oracle-1.80 install
perl connect.pl
connect.pl and connect.inc are exactly as you posted. Here's connect.pl:
use DBI; use DBD::Oracle;
$dbh = DBI->connect("dbi:Oracle:XEPDB1", 'rtdb1', 'password'); require("connect.inc");
and here is connect.inc:
$dbh2 = DBI->connect("dbi:Oracle:XEPDB1", 'rtdb1', 'password');
This is on CentOS Linux release 7.8.2003, and Perl 5.16.3.
Regards,
Dianne.
What perl and oracle versions did people try with this patch?
CentOS 7.8.2003 Perl 5.30.0 DBI 1.642 InstantClient 12.2 Oracle Database 19c
CentOS 8.2.2004 Perl 5.32.0 DBI 1.643 InstantClient 19.8 Oracle Database 19c
I tried with both my case and the @dfskoll case. I'm not using Oracle XE btw.
I applied the patch correctly. I don't know what I'm doing wrong, it seems like it should work. Maybe someone else can give it a go.
The attached patch fixes the problem for me. I would not say I'm particularly happy with this patch; I see it more as a workaround than a proper fix, but I'm attaching it for anyone who wants to try it out.
Can you flip this over to a pull request? That will have it run through Travis
Hi,
I tried to create a pull request, but I lack permission.
$ git push --set-upstream origin work-around-segfault-on-handle-destruction Username for 'https://github.com': dfskoll Password for 'https://[email protected]': remote: Permission to perl5-dbi/DBD-Oracle.git denied to dfskoll. fatal: unable to access 'https://github.com/perl5-dbi/DBD-Oracle/': The requested URL returned error: 403
Regards,
Dianne.
On 11/8/20 10:55 pm, Dianne Skoll wrote:
Hi,
I tried to create a pull request, but I lack permission.
$ git push --set-upstream origin work-around-segfault-on-handle-destruction Username for 'https://github.com' https://urldefense.com/v3/__https://github.com'__;!!GqivPVa7Brio!NnAn1kUR7uKe8N6UxfjVzhGNbT32jQEcaz03adA-Wupd7Wg-alLCng1SHGwOB7OEsSJOzw$: dfskoll Password for 'https://[email protected]' https://urldefense.com/v3/__https://[email protected]'__;!!GqivPVa7Brio!NnAn1kUR7uKe8N6UxfjVzhGNbT32jQEcaz03adA-Wupd7Wg-alLCng1SHGwOB7OVXg7hrA$: remote: Permission to perl5-dbi/DBD-Oracle.git denied to dfskoll. fatal: unable to access 'https://github.com/perl5-dbi/DBD-Oracle/' https://urldefense.com/v3/__https://github.com/perl5-dbi/DBD-Oracle/'__;!!GqivPVa7Brio!NnAn1kUR7uKe8N6UxfjVzhGNbT32jQEcaz03adA-Wupd7Wg-alLCng1SHGwOB7NhkPQCww$: The requested URL returned error: 403
Regards,
Dianne.
Were you pushing to your own fork? If not, try that, and then on on your repo's page a button that will create a PR in the perl5-dbi/DBD-Oracle repo should magically appear.
Chris
-- https://twitter.com/ghrd
@mjegh are you around? its looking like time for a release
I'm not sure I can. I retired and don't have access to Oracle now and so I cannot even run the test suite. Also, the Linux machine I did the build on was at work. I might be able to get access for a while at the weekend. Can you point me at the distzilla instructions you gave me before? as I can't find them. I'll try and work out what has been changed as I've not been keeping up.
On 2020-08-12 00:48, Christopher Jones wrote:
Were you pushing to your own fork? If not, try that, and then on on your repo's page a button that will create a PR in the perl5-dbi/DBD-Oracle repo should magically appear.
OK, thank you! I'm relatively new to the Github workflow. That worked, and I've created the PR.
Regards,
Dianne.
@dfskoll you're doing well and you'll get it. Thanks for your great contributions so far.
@mjegh congrats on retiring? for convenience you could use the Dockerfile at https://github.com/djzort/DBD-Oracle/tree/master/maint that will create an environment with perl 5.32 and Oracle-XE ready to go. See also PR #118
I found this bug independently, and at first I thought it was provoked by (highly dubious) local $object->{cached_dbh} sort of things.
Now I'm finding that both 1.76 and 1.80 can be provoked to segfault, under various different circumstances. In addition I'm seeing 1.76 is sometimes a 50:50 Heisenbug for whether it crashes.
Are you still looking at this? How do you want the cut-down test code I put together?
Hi, Matthew,
I'm not actively looking at this any more. If you apply the patch at https://github.com/perl5-dbi/DBD-Oracle/pull/121/commits/5d98d93bcedf3317f4ff739841162b521403662a does it stop the segfaults? (I'm not sure if the patch will apply against 1.76.)
Regards,
Dianne.
I found this bug independently, and at first I thought it was provoked by (highly dubious)
local $object->{cached_dbh}sort of things.Now I'm finding that both 1.76 and 1.80 can be provoked to segfault, under various different circumstances. In addition I'm seeing 1.76 is sometimes a 50:50 Heisenbug for whether it crashes.
Are you still looking at this? How do you want the cut-down test code I put together?
In terms of managing expectations, there is no sponsor or full time maintainer for this module.