trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

Fixes issue #8807 nexthop failure threshold in ATS 9.1.x

Open jrushford opened this issue 3 years ago • 12 comments

Fixes #8807. I've asked Jeremy Payne, jp557198, to test and comment here as he is the issue owner.

This was originally fixed in ATS 9.2.x with PR #8365 but #8365 was big and not backported to 9.1.x as it was a late PR and zwoop did not want to bring it into 9.1.x at that time.

Anyway, this fixes the issue where the failure count on a parent is not incremented properly.

jrushford avatar Apr 28 '22 18:04 jrushford

[approve ci fedora]

jrushford avatar Apr 28 '22 20:04 jrushford

[approve ci rocky]

jrushford avatar Apr 28 '22 20:04 jrushford

[approve ci rocky]

jrushford avatar Apr 28 '22 21:04 jrushford

[approve ci fedora]

jrushford avatar Apr 29 '22 12:04 jrushford

[approve ci rocky]

jrushford avatar Apr 29 '22 12:04 jrushford

[approve ci fedora]

traeak avatar Apr 29 '22 14:04 traeak

[approve ci rocky]

traeak avatar Apr 29 '22 14:04 traeak

[approve ci fedora]

jrushford avatar Apr 29 '22 14:04 jrushford

Both the Fedora and Rockylinux builds seem to be failing due to something introduced by this patch, not transient CI issues.

Fedora: https://ci.trafficserver.apache.org/job/Github_Builds/job/fedora/881/console

In file included from /usr/include/signal.h:328,
                 from ../../../tests/include/catch.hpp:8034,
                 from unit_tests/unit_test_main.cc:25:
../../../tests/include/catch.hpp:10822:58: error: call to non-'constexpr' function 'long int sysconf(int)'
10822 |     static constexpr std::size_t sigStackSize = 32768 >= MINSIGSTKSZ ? 32768 : MINSIGSTKSZ;
      |                                                          ^~~~~~~~~~~
In file included from /usr/include/bits/sigstksz.h:24,
                 from /usr/include/signal.h:328,
                 from ../../../tests/include/catch.hpp:8034,
                 from unit_tests/unit_test_main.cc:25:
/usr/include/unistd.h:640:17: note: 'long int sysconf(int)' declared here
  640 | extern long int sysconf (int __name) __THROW;
      |                 ^~~~~~~
In file included from unit_tests/unit_test_main.cc:25:
../../../tests/include/catch.hpp:10881:45: error: size of array 'altStackMem' is not an integral constant-expression
10881 |     char FatalConditionHandler::altStackMem[sigStackSize] = {};
      |                                             ^~~~~~~~~~~~
make[2]: *** [Makefile:974: unit_tests/test_tscpputil-unit_test_main.o] Error 1
make[2]: Leaving directory '/home/jenkins/workspace/Github_Builds/fedora/src/src/tscpp/util'
make[1]: *** [Makefile:1313: check-am] Error 2
make[1]: Leaving directory '/home/jenkins/workspace/Github_Builds/fedora/src/src/tscpp/util'
make: *** [Makefile:865: check-recursive] Error 1

RockyLinux: https://ci.trafficserver.apache.org/job/Github_Builds/job/rocky/279/console

FAIL: test_librecords
=====================

/home/jenkins/workspace/Github_Builds/rocky/src/lib/records/.libs/test_librecords: error while loading shared libraries: libssl.so.81.1.1: cannot open shared object file: No such file or directory
FAIL test_librecords (exit status: 127)

OK, nevermind this. I didn't notice that this patch is for 9.1.x. Probably 9.1.x needs an updated Catch for the newer compilers.

bneradt avatar Apr 29 '22 15:04 bneradt

Through no fault of this PR, the 9.1.x branch needs a few PRs merged in before these unit tests will build and run with the fedora:35 and rockylinux compilers:

  • #8231
  • #8683
  • #8814
  • #8765

Once those test-only changes are merged in, then the unit tests should finish successfully for this PR (they did for me locally).

bneradt avatar Apr 29 '22 20:04 bneradt

Fixes #8807. I've asked Jeremy Payne, jp557198, to test and comment here as he is the issue owner.

This was originally fixed in ATS 9.2.x with PR #8365 but #8365 was big and not backported to 9.1.x as it was a late PR and zwoop did not want to bring it into 9.1.x at that time.

Anyway, this fixes the issue where the failure count on a parent is not incremented properly.

I tested your code changes against our internal branch. Failure threshold now increments as expected. See below debug output confirming the same.

https://github.com/apache/trafficserver/issues/8807#issuecomment-1115427593

jp557198 avatar May 03 '22 12:05 jp557198

[approve ci autest]

randall avatar Jun 15 '22 18:06 randall

This pull request has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.

github-actions[bot] avatar Sep 14 '22 02:09 github-actions[bot]