whateverable icon indicating copy to clipboard operation
whateverable copied to clipboard

segfaults and other stability issues

Open AlexDaniel opened this issue 9 years ago • 34 comments

See this: http://irclog.perlgeek.de/perl6/2016-08-19#i_13055233

Pretty sure that it is not our fault, but we have to rakudobug it.

AlexDaniel avatar Aug 19 '16 18:08 AlexDaniel

  • [x] Seems like RT #129291 is the most common problem at this moment. Once that is fixed, we will probably see other issues.

AlexDaniel avatar Sep 19 '16 02:09 AlexDaniel

AlexDaniel avatar Oct 08 '16 19:10 AlexDaniel

  • [x] RT #129781 was fixed, next problem is that the process is not killed if there's a lot of stuff on stdout of Proc::Async. See RT #130370, but it's not a problem because a workaround has been added in commit c564d8de71e5049e8c93d760fbc6af3316326996.

AlexDaniel avatar Dec 17 '16 00:12 AlexDaniel

As of today, there are no segfaults. I'd still have to write tests for some cases mentioned here, but generally it is not an issue anymore.

AlexDaniel avatar Feb 09 '17 12:02 AlexDaniel

  • [x] OK, bots are not stable anymore. I think it's due to https://github.com/rakudo/rakudo/commit/9658dd98c9.

Getting stuff like this:

MoarVM panic: Internal error: invalid thread ID 284 in GC work pass

Didn't look into it deeply at all, but leaving a note here anyway.

Can be reproduced by running t/bisectable.p6 on the server (sometimes you may get lucky and the whole file will pass, but usually it crashes half way through).

AlexDaniel avatar Jul 27 '17 18:07 AlexDaniel

I’m getting those a lot too (happening debugging HTTP::Server::Async issues)

On July 27, 2017 at 11:18:45 AM, Aleks-Daniel Jakimenko-Aleksejev ([email protected]mailto:[email protected]) wrote:

OK, bots are not stable anymore. I think it's due to rakudo/rakudo@9658dd9https://github.com/rakudo/rakudo/commit/9658dd98c9.

Getting stuff like this:

MoarVM panic: Internal error: invalid thread ID 284 in GC work pass

Didn't look into it deeply at all, but leaving a note here anyway.

Can be reproduced by running t/bisectable.p6 on the server (sometimes you may get lucky and the whole file will pass, but usually it crashes half way through).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/perl6/whateverable/issues/24#issuecomment-318444466, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB75kKe5t-nLzDuQEw_fqm66YFsjJYXQks5sSNSBgaJpZM4JovXH.

tony-o avatar Jul 27 '17 18:07 tony-o

Seems to be alright now after fixes by @jnthn++.

AlexDaniel avatar Jul 28 '17 17:07 AlexDaniel

  • [x] Currently most bots are leaking memory (which is why some things are slower than they were before).

AlexDaniel avatar Aug 06 '17 16:08 AlexDaniel

The leakage was reported in RT #131879, and right now it is fixed in a way that it does not leak as much anymore. The memory usage increases if you keep throwing non-existent commits into the bots, but given 16GB of RAM on the server this is hardly a problem.

Right now, the bots are stable.

AlexDaniel avatar Aug 14 '17 22:08 AlexDaniel

  • [x] Quotable does not work (and was not working for a while): RT #131961 Greppable has a problem with it also, but it is more or less usable.

AlexDaniel avatar Aug 26 '17 02:08 AlexDaniel

RT #131961 is resolved, waiting for the next bug to appear now.

AlexDaniel avatar Sep 05 '17 01:09 AlexDaniel

  • [x] Well, didn't have to wait for too long. Most bots can't pass their tests, I don't know why yet. Things seem to hang.

AlexDaniel avatar Sep 05 '17 05:09 AlexDaniel

AlexDaniel avatar Sep 05 '17 07:09 AlexDaniel

Well, I guess it's not. Things are still broken though.

AlexDaniel avatar Sep 26 '17 15:09 AlexDaniel

  • [x] Also, we're stuck with non-HEAD version of rakudo because of RT #132191. See also https://github.com/zoffixznet/perl6-IRC-Client/issues/51.

AlexDaniel avatar Oct 01 '17 04:10 AlexDaniel

OK, RT #132191 turned out to be an issue in IRC-Client (it was relying on a rakudo bug).

Now there are at least two other problems. Bisectable fails with this output:

ok 60 - Did you mean “HEAD” (new)?
# Failed to get expected result in 11.04535627 seconds (11 nominal)
not ok 61 - Did you mean “HEAD” (old)?
# Failed test 'Did you mean “HEAD” (old)?'
# at /home/bisectable/git/whateverable/t/lib/Testable.pm6 (Testable) line 81
# expected: ["testable742093, Cannot find revision “DEAD” (did you mean “HEAD”?)"]
#  matcher: 'infix:<~~>'
#      got: []
# Test failed. Stopping test suite, because PERL6_TEST_DIE_ON_FAIL environmental variable is set to a true value.
# Failed to get expected result in 11.04317088 seconds (11 nominal)
not ok 62 - _
# Failed test '_'
# at /home/bisectable/git/whateverable/t/lib/Testable.pm6 (Testable) line 81
# expected: [-> ;; $_? is raw { #`(Block|84942264) ... }]
#  matcher: 'infix:<~~>'
#      got: []
# Test failed. Stopping test suite, because PERL6_TEST_DIE_ON_FAIL environmental variable is set to a true value.

There is no reason why test 61 would fail. Actually, it passes if you put it higher in that file. I don't know what's going on there, but most likely it's an issue in rakudo.

The second problem is that it runs some other test after the first test failed. Why? It should not be like that.

AlexDaniel avatar Oct 21 '17 12:10 AlexDaniel

Ah OK, the ‘_’ test is an issue in whateverable. Nevermind that. Why does it fail in the first place is beyond me however.

AlexDaniel avatar Oct 21 '17 12:10 AlexDaniel

  • [x] So, to make this clear, the tests are still failing. It simply stops dead when performing these tests: https://github.com/perl6/whateverable/blob/e9ccebadca9a44e4a27a2325737308828568786b/t/bisectable.t#L165-L170

The code involves a lot of calls to Text::Diff::Sift4 module, but nothing special really. This issue didn't exist a few releases ago, and I really am not sure when this happened exactly.

The same test works fine in committable.t, and actually if you move these tests higher in the bisectable.t file, they will pass. Really weird stuff going on.

AlexDaniel avatar Nov 19 '17 01:11 AlexDaniel

Alright, some progress on that! First of all, it doesn't hang, it segfaults. The reason I was thinking that it hangs is because the test suite does not really detect if the bot process dies unexpectedly, so there was no easy way to notice. Now I have some code that will help notice the issue in the future, will commit that soon.

Now, the segfault happens in the react block here: https://github.com/perl6/whateverable/blob/e9ccebadca9a44e4a27a2325737308828568786b/lib/Whateverable.pm6#L220-L232

So, that's easy now, right? Just run it under valgrind and you'll immediately see the issue…

Ha-ha.

Nope. You run it under valgrind, and the issue goes away. 💩

I'm suspecting that we may be seeing something like https://github.com/rakudo/rakudo/issues/1202 here, but it's hard to tell.

AlexDaniel avatar Nov 19 '17 05:11 AlexDaniel

Same issue under gdb: https://gist.github.com/MasterDuke17/0312dd2af1e3b2b498d91cfacc45343c

AlexDaniel avatar Nov 19 '17 23:11 AlexDaniel

  • [x] Reportable is currently suffering from this issue (SEGV): https://github.com/rakudo/rakudo/issues/1278

AlexDaniel avatar Nov 28 '17 19:11 AlexDaniel

  • [x] Bots are currently leaking memory like crazy. I will probably turn off some of them so that they don't max out the memory usage on the server.

AlexDaniel avatar Nov 29 '17 19:11 AlexDaniel

  • [x] Just had this intermittent fail:
Cannot find method 'specialize' on object of type NQPClassHOW

On this line: https://github.com/perl6/whateverable/blob/46337991a954885fe4c535319275bbb6f797b391/lib/Whateverable.pm6#L326

I cannot reproduce so we will just let it be…

AlexDaniel avatar Dec 08 '17 05:12 AlexDaniel

  • [x] MoarVM ticket that is probably related to our current memory leaks: https://github.com/MoarVM/MoarVM/issues/680

AlexDaniel avatar Jan 17 '18 16:01 AlexDaniel

  • [x] More or less isolated memory leak: https://github.com/rakudo/rakudo/issues/1501

AlexDaniel avatar Feb 08 '18 18:02 AlexDaniel

  • [x] I think it no longer leaks as much, but now bisectable segfaults here: https://github.com/perl6/whateverable/blob/177b77cb2ebc045736b8e7a1cf6eb8e25fdce7b6/t/bisectable.t#L186-L191

Again, there's nothing special with this test. And if you look closely, previous tests have been commented out because they were causing another segv previously. Here's the ticket: https://github.com/rakudo/rakudo/issues/1259

AlexDaniel avatar Feb 13 '18 18:02 AlexDaniel

Bots no longer leak memory like crazy, so that issue is resolved. Bisectable still can't get through its tests though.

AlexDaniel avatar Feb 16 '18 01:02 AlexDaniel

AlexDaniel avatar Mar 06 '18 18:03 AlexDaniel

OK issue #296 can be workarounded like this:

-my $host-arch = $*KERNEL.hardware;
+my $host-arch = ‘x86_64’;
$host-arch = ‘amd64’|‘x86_64’ if $host-arch eq ‘amd64’|‘x86_64’;
-$host-arch = $*KERNEL.name ~ ‘-’ ~ $host-arch;
+$host-arch = ‘linux’ ~ ‘-’ ~ $host-arch;

Heh. Not committing this to the repo because I'm hoping it'll get resolved relatively quickly.

AlexDaniel avatar Mar 07 '18 21:03 AlexDaniel

Could you try this diff and see if that makes a difference?

$ git diff diff --git a/src/core/Kernel.pm6 b/src/core/Kernel.pm6 index 1cde4c4..7ce4cf8 100644 --- a/src/core/Kernel.pm6 +++ b/src/core/Kernel.pm6 @@ -180,8 +180,8 @@ class Kernel does Systemic { } }

-Rakudo::Internals.REGISTER-DYNAMIC: '$*KERNEL', { +#Rakudo::Internals.REGISTER-DYNAMIC: '$*KERNEL', { PROCESS::<$KERNEL> := Kernel.new; -} +#}

If it does, then it’s something in the auto-vivification of dynamic variables that’s to blame, and not something specific to $*KERNEL.

On 7 Mar 2018, at 22:34, Aleks-Daniel Jakimenko-Aleksejev [email protected] wrote:

OK issue #296 can be workarounded like this:

-my $host-arch = $*KERNEL.hardware; +my $host-arch = ‘x86_64’;

$host-arch = ‘amd64’|‘x86_64’ if $host-arch eq ‘amd64’|‘x86_64’;

-$host-arch = $*KERNEL.name ~ ‘-’ ~ $host-arch; +$host-arch = ‘linux’ ~ ‘-’ ~ $host-arch; Heh. Not committing this to the repo because I'm hoping it'll get resolved relatively quickly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

lizmat avatar Mar 07 '18 22:03 lizmat