DeepSea functional tests

This topic is long. I did not know my exact audience, but imagined a developer with little Salt experience. Also, this is truly a work in progress. I reserve the right to change names of directories and files. :)

=== DeepSea has qa tests but these are written with Teuthology in mind. This implies that the environment is always a fresh set of VMs and that resetting the Ceph cluster is not a requirement.

Having access to these tests would be valuable for developers. The current workflow is expensive though in terms of time. Develop a set of changes. Build another environment (or destroy the current development environment depending on available hardware), pick a basic suite from the deepsea-qa package, run it, debug any environment related issues and correct, verify that the suite still succeeds. Now, rinse and repeat for other suites. Whether or not a developer knows which suites are safe to skip is based on experience.

As a result, I do not know anyone that does this rigorously.

So, let's take a step back and realize that developers and Teuthology have overlapping but two different sets of requirements. Teuthology is building a small set of VMs and executing some steps that prove some functionality is working. Developers have a working cluster but would like to know if their recent changes have broken any functionality.

Originally when I delved into the qa suite, I had the idea of states or orchestrations trying to call any of the suites. With manual experimentation, I found that this approach costs more than necessary. For an 18 node cluster, the calls to all the stages resulted in a 14-17 minute runtime. Additionally, the output is so long (especially with a particular Salt 2017 bug printing nuiscance stack traces continually) that any issue requires effort to investigate.

After a closer look at health-ok.sh, I realized that I only needed the tests after the stages.

I converted one example that I do not plan to keep, but left it here as an example. The example is create write_test pool. While this works, anyone debugging this has a few hoops to jump through. Second, anyone cutting/pasting the line without the subshell will pollute their shell. In all, sourcing a file to call a function is a bit opaque and requires reading the code. I believe this can and should be more friendly to both developers and admins.

The next example I realized is creating an on-the-fly shell script and then invoking Salt. Sticking with the original plan of using Salt, this flow would be Salt->sourced file->function->on-the-fly script->Salt. Since I ended with what I started, I implemented the rados_write function in Salt directly. This is rados write.

This state can be executed in one line with the terse result displaying each command.

# salt 'admin*' state.apply ceph.tests.rados_write 2>/dev/null
admin.ceph:
  Name: ceph osd pool application enable write_test deepsea_qa - Function: cmd.run - Result: Changed Started: - 22:28:58.695754 Duration: 999.538 ms
  Name: echo 'dummy content' > /tmp/verify.txt - Function: cmd.run - Result: Changed Started: - 22:28:59.695418 Duration: 7.773 ms
  Name: rados -p write_test put test_object /tmp/verify.txt - Function: cmd.run - Result: Changed Started: - 22:28:59.703318 Duration: 48.915 ms
  Name: rados -p write_test get test_object /tmp/verify_returned.txt - Function: cmd.run - Result: Changed Started: - 22:28:59.752426 Duration: 38.5 ms
  Name: cmp /tmp/verify.txt /tmp/verify_returned.txt - Function: cmd.run - Result: Changed Started: - 22:28:59.791056 Duration: 7.973 ms

Summary for admin.ceph
------------
Succeeded: 5 (changed=5)
Failed:    0
------------
Total states run:     5
Total run time:   1.103 s

This is not terribly different than the shell function. The set -ex will print each line and fail if any do. (This call could be placed directly in that function so that we only have one set of steps as well.)

One separate advantage of making the steps a Salt file is orchestrations. The health-ok.sh acts like an organization scheme to meet the needs of Teuthology; however, developers do not need the additional steps. Also, developers may wish to group a series of tests differently. The same test can be called as

# salt-run state.orch ceph.smoketests.ceph 2>/dev/null
admin.ceph_master:
  Name: create pool - Function: salt.state - Result: Changed Started: - 22:36:12.895980 Duration: 1543.765 ms
  Name: rados write - Function: salt.state - Result: Changed Started: - 22:36:14.439843 Duration: 1235.438 ms

Summary for admin.ceph_master
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2
Total run time:   2.779 s

Note that running the higher level orchestration does not show all the detailed steps of each state. The result does cater to development. Keep successes short and summarized. If something fails, allow the developer to drop down one level to run the exact test. Finally, seeing the actual commands that fail should keep the develop/test cycle as short as possible.

Another advantage is that ceph.smoketests can come to many any subset of orchestrations that are considered necessary. The same orchestrations and states can be grouped by DeepSea subsystem allowing subsequent developers to pick the appropriate level of testing for a set of changes. For example, run the rgw config smoketests and then rgw smoketests and finally the normal smoketests during the development phases. (I did not say all since I expect some tests may not be working or appropriate.)

Some may have noticed that the rados write test does not test DeepSea but Ceph. That's true. DeepSea is lacking many functional tests.

I did create another example that should illustrate how complex some of this can get, but I believe using Salt to test Salt is the best approach.

The simple test is to change the default warning from 30 PGs to 16 PGs per OSD. The edit is here.

The check is here.

The problem is that the file change and processes are on different nodes. The actual steps necessary to run the two files above is config mon smoketest.

The Salt IDs explain the steps and a majority of the states already are part of DeepSea. When run the output looks like

# salt-run state.orch ceph.smoketests.config.mon 2>/dev/null
admin.ceph_master:
  Name: change ceph.conf - Function: salt.state - Result: Changed Started: - 22:49:42.007512 Duration: 354.729 ms
  Name: create ceph.conf - Function: salt.state - Result: Changed Started: - 22:49:42.362388 Duration: 12124.079 ms
  Name: apply configuration - Function: salt.state - Result: Changed Started: - 22:49:54.486636 Duration: 1473.992 ms
  Name: restart mons - Function: salt.state - Result: Changed Started: - 22:49:55.960772 Duration: 517.179 ms
  Name: check change - Function: salt.state - Result: Changed Started: - 22:49:56.478106 Duration: 5396.836 ms
  Name: restore ceph.conf - Function: salt.state - Result: Changed Started: - 22:50:01.875041 Duration: 341.284 ms
  Name: recreate ceph.conf - Function: salt.state - Result: Changed Started: - 22:50:02.216472 Duration: 12260.828 ms
  Name: restore configuration - Function: salt.state - Result: Changed Started: - 22:50:14.477452 Duration: 438.173 ms
  Name: reset mons - Function: salt.state - Result: Changed Started: - 22:50:14.915727 Duration: 1460.337 ms

Summary for admin.ceph_master
------------
Succeeded: 9 (changed=9)
Failed:    0
------------
Total states run:     9
Total run time:  34.367 s

One thing to note is that the last four steps have nothing to do with the test, but will reset the cluster back to the default. This entire test is destructive and will not attempt to preserve the mon.conf, but not leaving a mess of configuration changes helps the developer. Also, resetting the environment must work or we have found yet another issue.

With respect to Teuthology and other qa suites, any of the individual states or orchestrations can be wrapped in a shell function.

For those that appreciate the writing on the wall, the intent is to require stubs and eventual functional tests with PRs. I plan to be lenient because I still want contributions. However, our pace may slow for a while adding these functional tests.

Mar 15 '18 22:03 swiftgist

Using salt to test salt makes sense to me.

I want to pick apart the last test example a bit if I may. The test itself puts a setting in the mon.conf file. So the goal of the test is to make sure that DeepSea actually applies that configuration to the mons. The SES docs say to run stage 3 after tweaking config files, which is what a user will presumably do, but the config mon smoketest only does ceph.configuration.create, ceph.configuration and ceph.mon.restart.force. Stage 3 does all the other stage 3 stuff in addition to ceph.configuration.create and ceph.configuration, but seems to eventually use ceph.mon.restart.controlled, not ceph.mon.restart.force. So there's some subtle difference in how the mons may be restarted depending on whether it's a user invoking stage 3, or a developer invoking this functional test. Is this a problem, or am I just massively overanalysing the example? :-)

Mar 28 '18 06:03 tserong

the pending branch wip-functional-tests includes a functional test for ceph.conf generation which might still be useful.

Aug 03 '18 12:08 jschmid1

DeepSea DeepSea copied to clipboard

functional tests

DeepSea
DeepSea copied to clipboard