corrode icon indicating copy to clipboard operation
corrode copied to clipboard

implement test suite

Open seanjensengrey opened this issue 8 years ago • 7 comments

  • https://www.haskell.org/cabal/users-guide/developing-packages.html
  • http://stackoverflow.com/questions/1044555/how-can-i-set-up-a-simple-test-with-cabal

relates to #43

seanjensengrey avatar Jul 20 '16 13:07 seanjensengrey

Hi @seanjensengrey! Did you start working on this?

As per an email that I sent previously to Jamey, I started thinking about what shape an automated test suite for Corrode would have. There are several strategies that can be implemented, all of them being interesting in their own right, so we could probably do a mix.

  • The first strategy is the one that's documented already, that is using csmith to generate random test cases and compare the result of the compilation with gcc / clang and run to the compilation with rustc and subsequent run of the translated file.
  • The second strategy is to try and translate various C projects and see what happens.

Both are these are interesting, especially with the current state of Corrode as they can help us come quickly across a whole bunch of things that are yet to be implemented.

On the other hand, if we want to validate the existing codebase, I suggest that we implement more granular tests. These can take several forms:

  • Regression tests: having a suite of simple C files implementing parts of the C spec that Corrode currently supports would be useful. We would run them through the same "comparing with GCC" machinery. That would allow us to detect any regression in the working parts of Corrode.
  • An orthogonal, and related thing to do, would be to also add some tests to parts of the C spec that Corrode does not support. Obviously, we don't want to do that for everything, but it can be useful for at least to different cases: the case where we specifically don't want to support a feature, and we want to avoid accidentally supporting it. Another case is where we want to support a case in the near future, after a development has been done. It allows us to create test cases ahead of time. When the development is merged into the branch, we can change the test runner code to expect them to pass instead of previously expecting them to fail.
  • Finally, a more fine-grained approach is to do unit tests at the code level, that is generate bits of AST using language-c, pass them through the Corrode machinery and compare the resulting Rust.AST to something we expect. It is more work to setup, but it would probably be the fastest.

As a tangential note, I also want to add that despite the fact that we don't support I/O functions yet, there is a very primitive form of output we can use: the return value from main(). It is fairly easy to set up scripts to compare the return code from a process to some expected value.

lamarqua avatar Jul 29 '16 19:07 lamarqua

Hello @lamarqua , I have been pondering it in my hammock. I am brand new to Haskell so I'll be a little slow.

Great breakdown. The goals I had in the back of mind are

  • Encourage / Allow non-Haskell programmers to help
  • Be goal directed on the area of focus by translating real applications
  • Prevent regressions
  • Be lightweight, make it easy to add tests
  • Encourage Corrode to become functional for some subset of C rather quickly

Funny you mention using return codes, because that was my first test case

I totally see merit in having AST level, 10-line file level and whole project translation sources. The most expedient to me, I think is a script to run corrode over a collection of C files and compare the return codes with expected output. Embed the expected result in the comments, compile with the local C compiler, then translate and compile with Rust. Return codes should match.

I'd love to be able to run corrode against zlib or libasn1, but other issues on how to handle the preprocessor, multiple file build, etc still need to be addressed.


Questions that a test suite could ask and answer:

  • What tasks need to be worked on?
  • Did it translate?
  • Did it translate correctly?
  • Did it translate how I expected?
  • Is it continuing to translate?
  • Did the resulting Rust code parse/compile/give the expected output?
  • What areas of the C spec are supported? What can I use?

It would be useful to see how gcc and clang handle their test suites. I like the idea of lifting, embracing as much as possible from other compilers.

  • https://gcc.gnu.org/install/test.html
  • https://gcc.gnu.org/testing/
  • https://gcc.gnu.org/wiki/HowToPrepareATestcase
  • https://github.com/gcc-mirror/gcc/tree/master/gcc/testsuite/gcc.dg

One thing we probably don't want to do is much negative testing ? If it doesn't compile with gcc/clang then the output of corrode is undefined.

One thing I noticed is in something like

int main(int argc, char** argv) {
    int j = 0;
    for(int i = 0; i < 10; i++) {
        j += i;
    }
    return j;
}

The current rustc will log the following during build for the translated source

for1.rs:25:23: 25:31 warning: unused variable: `argc`, #[warn(unused_variables)] on by default
for1.rs:25 pub unsafe fn _c_main(mut argc : i32, mut argv : *mut *mut u8) -> i32 {
                                 ^~~~~~~~
for1.rs:25:39: 25:47 warning: unused variable: `argv`, #[warn(unused_variables)] on by default
for1.rs:25 pub unsafe fn _c_main(mut argc : i32, mut argv : *mut *mut u8) -> i32 {
                                                 ^~~~~~~~
for1.rs:25:23: 25:31 warning: variable does not need to be mutable, #[warn(unused_mut)] on by default
for1.rs:25 pub unsafe fn _c_main(mut argc : i32, mut argv : *mut *mut u8) -> i32 {
                                 ^~~~~~~~
for1.rs:25:39: 25:47 warning: variable does not need to be mutable, #[warn(unused_mut)] on by default
for1.rs:25 pub unsafe fn _c_main(mut argc : i32, mut argv : *mut *mut u8) -> i32 {

Maybe have some results be rolled up into a report, like number of warn(unused_mut) , etc. Things that could possibly be addressed if there are further analysis passes done, but don't flag them as an error. Overly strict test suites would only slow things down at this stage.

So I advocate for doing the simplest possible thing and seeing how it can be used to strengthen the feedback cycle of the project.

seanjensengrey avatar Jul 30 '16 04:07 seanjensengrey

I love this discussion, please keep it up! :smile:

I'd just like to point out one thing: I don't want to fix the unused_mut warnings, because it should always be possible to declare them const in the original C source, causing Corrode to drop the mut annotation and fixing the warning. That means the Rust warning applies to some property of the original code (namely, that it allowed more mutability than it needed) and so it's a useful diagnostic if someone just wants to use Corrode+rustc as a static analysis tool for C.

unused_variables warnings are a little trickier because the way to suppress equivalent GCC or Clang warnings is through attributes, but suppressing them in Rust is done by changing the variable name. I'm open to suggestions on that.

I'd probably choose to run tests with all compiler warnings disabled, though, at least for now. (-w for GCC/Clang, -A warnings for rustc. I did both in scripts/csmith-test, for example, but it's especially important there because csmith makes no effort to generate "good" code.) Maybe as we get Corrode developed further, it'll become worthwhile to focus on minimizing warnings in the generated code then?

jameysharp avatar Jul 31 '16 04:07 jameysharp

I have pushed a proof of concept to https://github.com/seanjensengrey/corrode/tree/sjg-test-poc/test

It only handles positive tests with a return code from rust. Not yet sure how to support confirming that the translation was the expected one. For regressions, we could just store the corrode output and do a direct comparison.

The current test output looks like https://gist.github.com/seanjensengrey/711c0c9eb7a2004f49ca4c131aa4a4e2

seanjensengrey avatar Jul 31 '16 21:07 seanjensengrey

FWIW for #90 I made myself a test harness to compare that gcc and corrode+rust produced binaries generate identical output.

TESTS = pointer-arith-u8 pointer-arith-u32 pointer-arith-u32-cast
RUSTC = rustc
CORRODE = stack exec corrode --
CC = gcc

all: $(TESTS:%=test-%)

%.rustc : %.rs
    $(RUSTC) -o $@ $<

%.rs: %.c
    $(CORRODE) -o $@ $<

%.gcc : %.c
    $(CC) -o $@ $<

%.out: %
    ./$< > $@

test-%: %.gcc.out %.rustc.out
    diff $^

clean:
    rm -f $(TESTS:%=%.rustc) $(TESTS:%=%.rs) $(TESTS:%=%.gcc)

Given a TEST

  1. use gcc to compile to TEST.gcc
  2. use corrode to produce TEST.rs
  3. use rustc to produce TEST.rustc
  4. run both gcc and rusts binaries to produce TEST.*.out output files
  5. run diff to compare the two

tko avatar Nov 08 '16 18:11 tko

I dropped this, and I apologize! I did add some randomized property testing for the goto work I merged recently, so there are some tests on master now, but still, nothing regularly tests that Corrode correctly translates expressions.

Tagging in @Marwes who is the latest person to express interest in this topic.

Regarding comparing the output of Corrode+rustc against GCC or Clang: I've already half-solved that twice in different ways, since both corrode-cc and csmith-test (in scripts/) need different parts of that. I think it'd be great to see this work start with refactoring those somehow.

Also, just running csmith regularly would be a great start. I think the goal is to invoke csmith-test on Travis-CI and include the C-Reduced test case in the build output on failure. I'm hoping that running about 10 csmith tests should be enough to catch bugs with reasonable probability, without making the build hit Travis-CI's timeout if it has to run C-Reduce. But it's possible that C-Reduce just takes too long to run on Travis-CI, in which case we'd want to dump the original csmith output somewhere.

FWIW, I don't know if csmith-test passes right now; I couldn't run it while I had control-flow translation screwed up and I've been afraid to try again since then :wink:

jameysharp avatar Apr 10 '17 19:04 jameysharp

No worries, I dropped it as well. :) Should we get a test label and break this up into smaller tasks?

  • https://embed.cs.utah.edu/csmith/
  • https://embed.cs.utah.edu/creduce/

Would it be helpful to get this stuff (csmith, creduce) running in a container? I could see doing a wider, deeper test pass using bulk cloud instances independent of Travis-CI.

seanjensengrey avatar May 03 '17 15:05 seanjensengrey