kons-9 Recommendation: Smoke tests for main

Recommendation: Smoke tests for main

Open mikelevins opened this issue 1 year ago • 30 comments

In order to better prevent situations where merges to main break the build, I suggest that we create a suite of smoke tests to be run on a branch before merging it. I'm willing to write the tests, but I'm probably not the best person to do it. I didn't write most of the system and so I don't know the best way to test that most of it is working.

Probably the best person to write the basic, overall sanity tests is Kaveh. @kaveh808, I would be happy to work with you to get a set of basic sanity tests set up.

Sep 18 '22 16:09 mikelevins

Is it possible to test a merged branch before doing the pull? I.e. do a merge, test, then pull into main?

The demos should double as tests, but of course I can write more tests if you let me know what sort of tests are needed.

Sep 19 '22 18:09 kaveh808

(Oops deleted comments that were on the wrong thread, mostly redundant with existing comments on #114.)

Sep 19 '22 18:09 lukego

Is it possible to test a merged branch before doing the pull? I.e. do a merge, test, then pull into main?

The demos should double as tests, but of course I can write more tests if you let me know what sort of tests are needed.

I'd like to see us adopt a testing framework that provides easy automated test runs with failure reporting. I usually use Gary King's Lift, but other folks may prefer testing tools.

AN approach that I've used successfully in the past is to create a test directory that lives alongside the main sources directory. For each source file whose contents I want to test, there's a corresponding source file in test. For example,

project/ src/ bar.lisp foo.lisp test/ bar.lisp foo.lisp

test/bar.lisp contains definitions of tests for features that are defined in src/bar.lisp.

There's the usual system definition that defines how to load the project, and then there's a test-system definition that depends on the main system definition (so that loading the test system first loads the main system), and then runs all of the tests and prints a report of the results.

Test frameworks like Lift and others can be configured to tell us which tests succeeded and which failed. If set up in a convenient way, you can run all of the system's tests (or some subset of them) as easily as you can load the system. That makes it convenient to run the sanity checks every time you think you're ready to commit and push.

Admittedly, it does call for a little brutn work setting things up in advance. If you decide you want to go this way, I can help get things set up.

If other folks have experience with other test frameworks and like them better, let's hear about them. If there are better tools, we should use them.

Sep 20 '22 00:09 mikelevins

I agree we need a testing framework, based on @mikelevins comments here and @lukego comments in #114. Both breaks in the main branch have been due to my carelessness.

Mikel, please take the lead on this. If you guys can set up a testing framework, I'm happy to start populating it with tests, initially for the kernel files.

As always, my inclination is something simple, with as few dependencies as possible. :)

Sep 20 '22 00:09 kaveh808

Has anyone reviewed #7, an open PR that introduces a test framework?

Sep 20 '22 04:09 lukego

I had a look at it, but when it was done the code was changing too much to have a testing framework. If you guys feel it is worth considering now, please do so.

Sep 20 '22 18:09 kaveh808

Has anyone reviewed #7, an open PR that introduces a test framework?

I looked at it, but the PR depends on CCL. It's possible that an updated version of it might work, but on first glance it looks to me like it would probably be more straightforward to branch the current main and introduce some testing framework that someone likes.

I've used a few different ones over the years. THe one I;ve ended up using most of the time is Gary King's Lift, though I'm not wedded to it. If someone has one they feel strongly about, I'm okay with that. I agnostic about the one used in the previous PR; I know nothing about it.

Sep 20 '22 18:09 mikelevins

I agree we need a testing framework, based on @mikelevins comments here and @lukego comments in #114. Both breaks in the main branch have been due to my carelessness.

Mikel, please take the lead on this. If you guys can set up a testing framework, I'm happy to start populating it with tests, initially for the kernel files.

As always, my inclination is something simple, with as few dependencies as possible. :)

Okay, I'll do it, @kaveh. @lukego, @foretspaisibles , if you want to pitch some specific test framework and discipline, let me know. If not, I'll probably just set something up that I'm used to doing.

Sep 20 '22 18:09 mikelevins

kaveh, once I have something or other in place to run and report the tests, I'll need to consult with you about what the basic smoke tests should be. we can talk more about it then, and take recommendations from everyone. But I think @kaveh808 is probably in the best position to know what the basic sanity checks need to be.

Sep 20 '22 18:09 mikelevins

Has anyone reviewed #7, an open PR that introduces a test framework?

I looked at it, but the PR depends on CCL.

Oh I forgot about that, so I will post a fixed PR. For me I wrote Confidence because I prefer test frameworks that stay simple and do not introduce unnecessary concepts, and representing test cases by (instrumented) functions makes it easy to rely on common Lisp constructs and idioms to implement test fixtures, etc. instead of introducing new specific abstractions and gadgets.

I never tried Lift but if @mikelevins used it many years it is probably a good pick. :-)

Sep 21 '22 14:09 foretspaisibles

I updated the PR so that it uses SBCL. In the shell, running the full test suite or part of it yields the following output (% is the shell prompt)

% development/testsuite                
Name: KONS-9/TESTSUITE:RUN-ALL-TESTS
Total: 19
Success: 19/19 (100%)
Failure: 0/19 (0%)
Condition: 0/19 (0%)
Outcome: Success

A specific test suite:

% development/testsuite exercise-strcat
Name: KONS-9/TESTSUITE:EXERCISE-STRCAT
Total: 2
Success: 2/2 (100%)
Failure: 0/2 (0%)
Condition: 0/2 (0%)
Outcome: Success

Testsuites can also be run from SLIME, just using the function name. (e.g. KONS-9/TESTSUITE:RUN-ALL-TESTS)

Sep 21 '22 14:09 foretspaisibles

Has anyone reviewed #7, an open PR that introduces a test framework?

I looked at it, but the PR depends on CCL.

Oh I forgot about that, so I will post a fixed PR. For me I wrote Confidence because I prefer test frameworks that stay simple and do not introduce unnecessary concepts, and representing test cases by (instrumented) functions makes it easy to rely on common Lisp constructs and idioms to implement test fixtures, etc. instead of introducing new specific abstractions and gadgets.

I never tried Lift but if @mikelevins used it many years it is probably a good pick. :-)

Well, let's not just assume anything I touch is good. I can report that's definitely not the case.

I use Lift because it makes the following things easy to do:

Create a parallel source structure for tests, so that for each source file that you want to test, you can easily create a parallel test file, and it's obvious at a glance which test file goes with which source file
Create an ASDF test system that depends on the main system being tested, loads the tests, and provides a single point of entry that runs all of the tests and reports on all the results
Define simple points of entry for running subsets of the full test suite, with reporting on which tests failed and how
Write failure conditions and reporting in a clear and self-documenting way

As a simple example of what that looks like, you can look at

https://github.com/mikelevins/folio2/blob/master/folio2-as-tests.asd

That file defines Lift tests for the AS subsystem in folio2. At the bottom of the asdf file in a comment is a form that runs the AS tests.

The file

https://github.com/mikelevins/folio2/blob/master/folio2-tests.asd

defines a system that aggregates all of the folio2 tests, and a similar comment at the bottom of the file runs all of the folio2 tests.

To see what test sources look like, take a look at tests/as.lisp. Note that the file tests/as.lisp parallels the source file it tests, src/as.lisp. It's important to me to be able to see at a glance which tests refer to which source files. (Of course some tests, particularly integration tests, do not refer to specific source files, and so should not be positioned as if they do. But that's easy enough to do.)

Sep 21 '22 15:09 mikelevins

I updated the PR so that it uses SBCL. In the shell, running the full test suite or part of it yields the following output (% is the shell prompt)
% development/testsuite                
Name: KONS-9/TESTSUITE:RUN-ALL-TESTS
Total: 19
Success: 19/19 (100%)
Failure: 0/19 (0%)
Condition: 0/19 (0%)
Outcome: Success
A specific test suite:
% development/testsuite exercise-strcat
Name: KONS-9/TESTSUITE:EXERCISE-STRCAT
Total: 2
Success: 2/2 (100%)
Failure: 0/2 (0%)
Condition: 0/2 (0%)
Outcome: Success
Testsuites can also be run from SLIME, just using the function name. (e.g. KONS-9/TESTSUITE:RUN-ALL-TESTS)

This looks fine to me. Could you have a look at my discussion of the use of Lift in folio2, in another comment here, and offer your comments? Do you feel that Confidence addresses all of the same concerns equally well or better?

I actually like the use of test-suite objects and other features of Lift because they provide a handy organizing principle as the number of individual tests grow, but I'm happy to hear a pitch for the Confidence approach. Maybe it's better.

Sep 21 '22 15:09 mikelevins

Create a parallel source structure for tests, so that for each source file that you want to test, you can easily create a parallel test file, and it's obvious at a glance which test file goes with which source file

Yes, that's the convention used in projects using Confidence, but it's just a convention and Confidence does not force anyone to follow it and it is possible to organise things differently. Since tests are represented by regular Lisp functions, it's easy to navigate them using IDE's facilities (jump to source) and Confidence itself does not look at environmental variables when files are loaded or compiled.

Create an ASDF test system that depends on the main system being tested, loads the tests, and provides a single point of entry that runs all of the tests and reports on all the results

Yes this is what happens. If some test fails, analysis is written in the output. If started from an interactive session, it returns a structure that contains every details about the run and allows to restart failed tests. Confidence does not provide itself point of entry because it prefers explicit code and the user has to define an entry point itself like in the following examples:

https://github.com/melusina-org/kons-9/blob/introduce-testsuite/testsuite/entrypoint.lisp#L11
https://github.com/melusina-org/cl-rashell/blob/main/testsuite/testsuite.lisp#L16 (other project)

Define simple points of entry for running subsets of the full test suite, with reporting on which tests failed and how

Yes, all tests are functions defined with DEFINE-TESTCASE and are marked as exported in the package owning them. When defined by DEFINE-TESTCASE the function is instrumented so that the results of nested test calls are aggregated, conditions when evaluated arguments detected and documented etc.

Having accurate failure reports for tests was a motivation for writing Confidence instead of using 5AM or similar tools built around an IS macro. Confidence provides ASSERTIONS that can be defined with DEFINE-ASSERTION and are essentially predicates which are also expected to provide details when they fail. For instance:

KONS-9/TESTSUITE> (assert-string= "A text" "Another text")
NIL
"Assert that STRING1 and STRING2 satisfy the STRING= predicate.
This assertion supports the same keyword parameters as STRING=.
Every character of STRING1 is equal to the character of STRING2 at
the same index upto index 1. However this condition does not hold for characters
at position 1, which are #\\  and #\\n."

Write failure conditions and reporting in a clear and self-documenting way

When arguments to an assertion or a test case are signalling a condition, Confidence is generating output like that

KONS-9/TESTSUITE> (define-testcase demonstrate-condition-handling ()
		    (assert-string= "A text" (error "This form does not evaluate safely.")))
DEMONSTRATE-CONDITION-HANDLING
KONS-9/TESTSUITE> (demonstrate-condition-handling)
Name: DEMONSTRATE-CONDITION-HANDLING
Total: 1
Success: 0/1 (0%)
Failure: 0/1 (0%)
Condition: 1/1 (100%)
Outcome: Failure
================================================================================
#<ASSERTION-CONDITION {700985F443}> is an assertion result of type ASSERTION-CONDITION.
Type: :FUNCTION
Name: ASSERT-STRING=
Path:
  DEMONSTRATE-CONDITION-HANDLING
Arguments:
 Argument #1: "A text"
 Argument #2: #<SIMPLE-ERROR "This form does not evaluate safely." {700971A3D3}>
Form: (ASSERT-STRING= "A text"
                      (ERROR "This form does not evaluate safely."))
Outcome: Condition
Description:
  In this call, forms in argument position evaluate as:

  STRING1: "A text"

  STRING2: (ERROR "This form does not evaluate safely.") => CONDITION
    The evaluation of this form yielded a condition

#<SIMPLE-ERROR "This form does not evaluate safely." {700971A3D3}>
  [condition]

Slots with :INSTANCE allocation:
  FORMAT-CONTROL                 = "This form does not evaluate safely."
  FORMAT-ARGUMENTS               = NIL
#<CONFIDENCE:TESTCASE-RESULT :NAME DEMONSTRATE-CONDITION-HANDLING :CONFIDENCE::TOTAL 1 :CONFIDENCE::SUCCESS 0 :CONFIDENCE::FAILURE 0 :CONDITION 1 {700971A133}>

Confidence also defined floating point comparisons as in Knuth's books and uses the same terminology, but using another test framework would not prevent importing the corresponding comparison operations.

Last the codebase for Confidence is quite small, 1kloc reported by cloc, for what it's worth, which is a fraction of other testing frameworks.

Sep 21 '22 15:09 foretspaisibles

A second motivation to write Confidence was to have tests represented as functions. Having a test suite built around an IS macro is an incitation for writing a lot of macros around this, which is a small nuisance in my opinion, functions are easier to work on. Having tests represented as objects has for me the downside of breaking natural code navigation as it is not possible to look at the code entrypoint and figure out what tests are running. Usually test-objects say which test suite they want to participate to, so the relation which is seen in the code goes in the direction opposite to the code execution flow. I prefer a covariant approach. :-)

Sep 21 '22 16:09 foretspaisibles

A second motivation to write Confidence was to have tests represented as functions. Having a test suite built around an IS macro is an incitation for writing a lot of macros around this, which is a small nuisance in my opinion, functions are easier to work on. Having tests represented as objects has for me the downside of breaking natural code navigation as it is not possible to look at the code entrypoint and figure out what tests are running. Usually test-objects say which test suite they want to participate to, so the relation which is seen in the code goes in the direction opposite to the code execution flow. I prefer a covariant approach. :-)

Your answers to my questions are encouraging. Let me build some tests for one of my small projects using Confidence, to satisfy myself that there aren't any unpleasant warts that will vex us (also, if you know of such warts, and of ways that you are accustomed to working around them, do please let us know).

Assuming that things turn out to be as good as they sound, I'll report back here and recommend that we adopt Confidence. If I find something that concerns me, I'll raise questions about it here.

Sep 21 '22 18:09 mikelevins

I am very happy to see some interest for this and I am curious to hear your comments! Feel free to raise any question here about Confidence. Maybe @lukego is also interested in this specific topic, having generally interest for testing.

If you have texinfo installed you can development/makedoc to build documentation, it produces a PDF file in obj/makedoc. (Also INFO and HTML.)

I am not sure how good Confidence is for people who prefer to have the test stop on each error so that they can open the debugger and look at things. I am more used to use the debugger as a last resort solution and prefer to complete the test suite to validate my assumptions or answer questions I have, and test often so that I test individual features when I develop them and test everything before I push. However, if other people like another workflow better and can describe it clearly, I would be happy to improve their experience.

I tried tools with an interactive debugging style, like fiasco or 5am, but I did not stayed attach to that.

Sep 22 '22 16:09 foretspaisibles

I just added documentation builds to GitHub actions, so that it is possible to download the PDF file produced by makedoc here, without installing texinfo and texlive oneself:

https://github.com/melusina-org/cl-confidence/actions/runs/3130460088

The files are at the bottom.

Sep 26 '22 19:09 foretspaisibles

I've added Lift and Confidence to one of my small projects, and have added a couple of simple tests using Lift. I'll next add equivalent tests with Confidence so that I can report on any notable advantages and disadvantages.

Sep 26 '22 20:09 mikelevins

(Sorry I'm taking taking so long on this; I have several things competing for my time at the moment.)

@foretspaisibles [UPDATED] I found what I was looking for: ASSERT-CONDITION.

Oct 05 '22 11:10 mikelevins

I trued writing a couple of simple tests for one of my smaller projects using both Lift and Confidence. It should be no surprise that I found it easier to write them with Lift, since I;ve been using it for several years, and this is my first exposure to Confidence. That being the case, I don't think that suggests anything is wrong with Confidence.

I found the packages quite similar. @foretspaisibles emphasizes that Confidence is composed of functions and is low ceremony, but in practice I didn't notice a lot of difference because of that. The amount of code that went into tests using both frameworks was almost identical.

Lift uses some macrology to define named test suites and then add test cases to them. Confidence uses some macrology to define test cases (which are some slightly special functions). Both packages offer sets of convenient operators for detecting conditions used to signal whether a test has succeeded.

The main ptactical difference in my tests was that I know how to ask Lift to print an information report on the results of a test run, and I couldn't easily find how to do the same with confidence. For example, after running a simple test that I deliberately designed to fail, and calling LIFT;DESCRIBE-TEST-RESULT, I get the following output:

Failure: lift-tests : test-gen-100
  Documentation: NIL
  Source       : /Users/mikel/Workshop/src/nym/lift-tests/tests.lisp
  Condition    : Ensure failed: (= 10 (LENGTH NAMES)) ()
  During       : (END-TEST)
  Code         : (
  ((LET* ((SAMPLES
           (NYM::READ-SAMPLES
            (ASDF/SYSTEM:SYSTEM-RELATIVE-PATHNAME :NYM "data/us.names")))
          (NAMES (NYM::GEN-CHUNK-TRAVESTIES SAMPLES 100)))
     (LIFT:ENSURE (= 10 (LENGTH NAMES))))))

This report alerts me to a failed test, identifies the failing test by name, and prints the relevant code of the failing test, which makes it easy for me to track down the issue.

It may be that confidence offers the same kind of features--or even better ones--but I didn't find examples of it or an explanation of how to achieve that effect in the docs. Of course, I may well have overlooked something. Confidence does return an informative data structure describing the outcomes of executed tests, and it may be that those structures can be used to generate reports similar to those produced by Lift.

As of this morning, if I were choosing between Lift and Confidence, I would choose Lift, simply because I've had a good experience with using it and I know how to use it.

I think a decent counterargument might be that the author of Confidence is a contributor to kons-9 and the author of Lift isn't. That fact might by itself be a reason to choose Confidence, especially if its author is open to extending it with conveniences it lacks, or advising us on how to do things that we don't find obvious (like producing the kind of report I referred to above).

Oct 05 '22 13:10 mikelevins

Thank you @mikelevins for your detailed comments.

I think a decent counterargument might be that the author of Confidence is a contributor to kons-9 and the author of Lift isn't. That fact might by itself be a reason to choose Confidence, especially if its author is open to extending it with conveniences it lacks, or advising us on how to do things that we don't find obvious (like producing the kind of report I referred to above).

Actually I am not only open about this but quite enthusiastic! :-) In the initial phases I worked by following the comparison of https://sabracrolleton.github.io/testing-framework and I would be very happy to continue the work on Confidence.

Oct 05 '22 18:10 foretspaisibles

Thank you @mikelevins for your detailed comments.

I think a decent counterargument might be that the author of Confidence is a contributor to kons-9 and the author of Lift isn't. That fact might by itself be a reason to choose Confidence, especially if its author is open to extending it with conveniences it lacks, or advising us on how to do things that we don't find obvious (like producing the kind of report I referred to above).

Actually I am not only open about this but quite enthusiastic! :-) In the initial phases I worked by following the comparison of https://sabracrolleton.github.io/testing-framework and I would be very happy to continue the work on Confidence.

@foretspaisibles, would you be willing to add a small example use of confidence to its repo that shows simple uses of its basic features? If the repo has that, I'll feel good about recommending that we adopt it for testing.

Also, if the example could show us how to ask confidence to display a message when tests are complete--similar in spirit to the example from Lift--I think that would be helpful. If there's not a simple, convenient way to generate such messages, you may consider this a feature request. It doesn't need to do everything exactly the way Lift does it, but I would be grateful for a simple way to request a summary message the tells whether tests succeeded or failed, names any that failed, and, if possible, provides some clues to the nature of the failure.

Thanks for your enthusiasm!

Oct 07 '22 13:10 mikelevins

I'll look into that and add comments here!

Oct 08 '22 16:10 foretspaisibles

@mikelevins I wrote an example file with top-level evaluation forms as what we have in KONS-9, do you find it useful regarding your questions? If so I could also move part of the contents to the README or Texinfo file, otherwise happy to refine! (A public repo using Confidence is https://github.com/melusina-org/cl-rashell which also could be useful as a reference.)

https://github.com/melusina-org/cl-confidence/blob/record-result/example/example.lisp

Oct 11 '22 15:10 foretspaisibles

@mikelevins I wrote an example file with top-level evaluation forms as what we have in KONS-9, do you find it useful regarding your questions? If so I could also move part of the contents to the README or Texinfo file, otherwise happy to refine! (A public repo using Confidence is https://github.com/melusina-org/cl-rashell which also could be useful as a reference.)

https://github.com/melusina-org/cl-confidence/blob/record-result/example/example.lisp

Yes, your link is exactly what I had in mind, thanks! I think it's probably enough to add a reference to that example file in the README. The source is quite clear.

I'm familiar with texinfo, but I don't use it much, so I don't really know the most convenient way to read it (unless it's just turning it into info files and reading it with info or emacs). Do you have tools that you typically use to read texinfo files?

If other contributors use texinfo as little as I do (or even less) maybe we should consider some more-used doc format? Or at least adding some notes about the easiest way to read the docs.

Oct 11 '22 19:10 mikelevins

Nice to read, my changes made it to main and I added a link in the README. The documentation is created on each push by GitHub actions as artefacts (PDF, INFO, HTML) but I am not sure yet how to make them more visible from the README file.

https://github.com/melusina-org/cl-confidence#introduction-for-new-users

Oct 12 '22 05:10 foretspaisibles

Nice to read, my changes made it to main and I added a link in the README. The documentation is created on each push by GitHub actions as artefacts (PDF, INFO, HTML) but I am not sure yet how to make them more visible from the README file.

https://github.com/melusina-org/cl-confidence#introduction-for-new-users

Great, @foretspaisibles, I appreciate the work. Could you help @kaveh808 write a simplest-possible smoke test for kons-9? The idea would be to pick some very basic test that helps Kaveh (and other contributors) see how to set up the test case, checks for some very basic feature that Kaveh judges to be important as a base test case, and reports clearly on success or failure.

Ideally, it would be something that Kaveh can rely on as a basic sanity test that the system is loading and working at a basic level (he'll have to identify what that tesst would be). Also ideally, it would take the form of a test that is really easy to intentionally sabotage so that it produces a clear and understandable failure message (you'll have to show us how best to do that).

Also, it would be great if it can be loaded and controlled from the kons-9 system, or from a kons-9-test system. We'll eventually want the test code to provide some conveniences for things like running all test cases or only a subset, for controlling how much logging and reporting it does, and so forth, but if we can just get a basic simple smoke test going, it should become easier to start to add those features.

I'll pitch in where and when I can. If either of you don't have the bandwidth to work on it, let me know and I'll try to get it going, but I'm trying to keep up with a few other things, too, and I don't know these two pieces of the picture as well as the two of you do.

Oct 15 '22 18:10 mikelevins

That sounds interesting yes! @kaveh808 do you have any part of the code that look like a good candidate to you for this? Otherwise I'll just start with some point-related routines and maybe curve related functions.

Oct 17 '22 17:10 foretspaisibles

Hmm... point-cloud.lisp is the most basic geometry class. Do you think it would be a good candidate?

Oct 17 '22 18:10 kaveh808

kons-9 kons-9 copied to clipboard

Recommendation: Smoke tests for main

kons-9
kons-9 copied to clipboard