scylla-cluster-tests
scylla-cluster-tests copied to clipboard
feature(artifact_test): testing the output of perftune.py
perftune.py is a script within seastar (which is a part of scylla_setup) that configures several system parameters, like the amount of CPU cores that are designated to handle IRQ, and which are not.
I have recorded the expected results as of master 20230717.567b4536892f in defaults/perftune_results.json, and the test will compare them to the output of the script.
https://github.com/scylladb/scylla-enterprise/issues/2909
PR pre-checks (self review)
- [x] I followed KISS principle and best practices
- [x] I didn't leave commented-out/debugging code
- [x] I added the relevant
backportlabels - [ ] New configuration option are added and documented (in
sdcm/sct_config.py) - [ ] I have added tests to cover my changes (Infrastructure only - under
unit-test/folder) - [ ] All new and existing unit tests passed (CI)
- [ ] I have updated the Readme/doc folder accordingly (if needed)
Job I use to test the feature: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/Shlomo/job/artifacts-ami-test_perftune/ So far, the failure is due to mismatch with the expected result written in the doc to the result in practice. @vladzcloudius Can you please check my comments?
Job I use to test the feature: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/Shlomo/job/artifacts-ami-test_perftune/ So far, the failure is due to mismatch with the expected result written in the doc to the result in practice. @vladzcloudius Can you please check my comments?
@ShlomiBalalis was it related to what we discussed on GoogleChat with you and @roydahan or was it something different?
Please, let me know if you still want me to take a look at anything at the moment?
Job I use to test the feature: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/Shlomo/job/artifacts-ami-test_perftune/ So far, the failure is due to mismatch with the expected result written in the doc to the result in practice. @vladzcloudius Can you please check my comments?
@ShlomiBalalis was it related to what we discussed on GoogleChat with you and @roydahan or was it something different?
Please, let me know if you still want me to take a look at anything at the moment?
No, it's a very old comment on a very old PR. I will update the PR in a moment
@roydahan ping
In https://github.com/scylladb/scylla-pkg/pull/3500, new instance types were added to AWS and GCE. In the latest push, I added the results of perftune on said machines
Please check the failing checks. other than that, looks good to me.
This is the only failure. Doesn't seem to be related to my PR
[2023-07-26T17:41:33.825Z] _______________________ TestFileLogger.test_file_logger ________________________
[2023-07-26T17:41:33.825Z]
[2023-07-26T17:41:33.825Z] self = <unit_tests.test_sct_events_file_logger.TestFileLogger testMethod=test_file_logger>
[2023-07-26T17:41:33.825Z]
[2023-07-26T17:41:33.825Z] def tearDown(self) -> None:
[2023-07-26T17:41:33.825Z] > self.file_logger.stop(timeout=1)
[2023-07-26T17:41:33.825Z]
[2023-07-26T17:41:33.825Z] unit_tests/test_sct_events_file_logger.py:41:
[2023-07-26T17:41:33.825Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-07-26T17:41:33.825Z]
[2023-07-26T17:41:33.825Z] self = <EventsFileLogger name='EventsFileLogger-117' pid=3104 parent=10 stopped exitcode=0 daemon>
[2023-07-26T17:41:33.825Z] timeout = 1
[2023-07-26T17:41:33.825Z]
[2023-07-26T17:41:33.825Z] def stop(self, timeout: float = None) -> None:
[2023-07-26T17:41:33.825Z] events_process_class_name = self.__class__.__name__
[2023-07-26T17:41:33.825Z] LOGGER.debug("Stopping events process %s", events_process_class_name)
[2023-07-26T17:41:33.825Z] self.terminate()
[2023-07-26T17:41:33.825Z] LOGGER.debug("Waiting for events process %s to stop", events_process_class_name)
[2023-07-26T17:41:33.825Z] self.join(timeout) # pylint: disable=no-member; both threading.Thread and multiprocessing.Process have it
[2023-07-26T17:41:33.825Z] if self.is_alive(): # pylint: disable=no-member;
[2023-07-26T17:41:33.825Z] LOGGER.error("Events process %s is still alive after timeout", events_process_class_name)
[2023-07-26T17:41:33.825Z] > assert False, f"Events process {events_process_class_name} is still alive after timeout"
[2023-07-26T17:41:33.825Z] E AssertionError: Events process EventsFileLogger is still alive after timeout
[2023-07-26T17:41:33.825Z]
[2023-07-26T17:41:33.825Z] sdcm/sct_events/events_processes.py:104: AssertionError
@vladzcloudius , could you help us here re-reviewing this PR?
@vladzcloudius already approved that the results on master can be treated as the expected results. I don't think we should wait more with this.
@vladzcloudius already approved that the results on master can be treated as the expected results. I don't think we should wait more with this.
My apologies, guys. Too much fires to put down at the same time. Yes, please, move forward for now. I'll review later and if issues found we can send a follow up fix.
...and there is a comment already. ;)
This PR is just getting rotten here. @ShlomiBalalis is not available.
We need someone to finalize the work. If we don't do it, I'm going to merge it and we will handle the failures as part of master.
This PR is just getting rotten here. @ShlomiBalalis is not available.
We need someone to finalize the work. If we don't do it, I'm going to merge it and we will handle the failures as part of master.
good luck with that
reminder to self: I ran https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/Shlomo/job/2023.1-artifacts-ami-arm-test/4/ and all tests passed. Need to go over the log and make sure it did what we expected it to do.
@ShlomiBalalis please try to get back to it, so we will merge it.
@ShlomiBalalis please return to it and update these tests so we can merge them.
rebased it, and giving a quick go in few jobs:
- [ ] 🔴 https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/artifacts-docker-test/29/
- [ ] 🔴 https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/artifacts-ubuntu2204-test/62/
- [ ] 🔴 https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/artifacts-ubuntu2404-test/11/
seems like it's currently broken and fail on some permission errors regardless CI isn't working on this PR, since the owner isn't in scylla anymore.
opening a new one: https://github.com/scylladb/scylla-cluster-tests/pull/8788