overture icon indicating copy to clipboard operation
overture copied to clipboard

Better and simpler way to test the tool

Open peterwvj opened this issue 9 years ago • 34 comments

Recently we've experienced that tasks related to testing the tool are becoming increasingly complex. This is partly caused by the way we store and test the standard examples (those that bundle with the tool) and the SCSK tests, as well as how we update test results.

In a recent email discussion there was a proposal to have a separate repo with the external tests and the means to execute them, which would obviously mean that the repo would include some test framework and would depend (via Maven) on the Overture jar. Do people agree that this is the right approach to take?

I think that Luis has done a good job with setting up a test framework and I'm generally happy with the way it works. The problem with updating result files, as I see it, is that you often have to search Java sources in order to find out what property I need to pass to update certain result files. I have seen that for some tests the update property is output to the console (in case of test failure) but for other tests that does not seem to be the case. It might be that I am wrong though. Generally I think the idea of passing properties to update the result files is good, but it should be a straight-forward task to find out what property needs to be passed. Do you see other ways that we can improve this?

I hope we can use this issue to discuss and improve testing of the tool.

@ldcouto @nickbattle @lausdahl

added the current list of tasks to the main issue. -- ldc

  • [ ] Test Framework documentation
    • [x] Overview page descrbing the workings of the test framework
    • [x] How-to page explaining how the framework should be used. This should include:
      • [x] how to write a new test;
      • [x] how to add new inputs for existing tests;
      • [x] how to use external input files when testing (via property)
      • [x] how to write tests that use the external inputs;
      • [x] how to inspect test failures and update results.
  • [ ] Test Framework improvements
    • [ ] Delete the old test framework core/testframeworkand move all tests to the new one core/testing. The name of the new test framework module can then be changed to the old one, if people care about that.
    • [ ] Standardise test framework properties. The update property should be as follows -Dtest.update.[module].[test].[testcase]. module refers to the module being tested. For simplicity we should use the maven module names. test refers to the test class. We should use the same name as the class. testcaserefers to a specific instance of the parameterised test. We should refer to these by the full name of the input file.
    • [x] All test failures should report
      • [x] Source info if the test originates from a file
      • [x] The result file path
      • [x] The property that must be set to override the test result
  • [ ] Test inputs and results
    • [ ] Inputs and results for standard tests should be stored together under src/test/resources. It is up to the maintainer of each module to organise the tests in subfolders however he wants. However, there are two restrictions: The framework however, will only support 1 path to the root of the test inputs. In order to differentiate between pp and rt, the folder contaning the files must be named for its dialect (pp, sl or rt). An example: src/test/resources/bugregression/pp.
    • [x] Example tests are removed from the main build. A separate integration test project may use them at a later date.
    • [ ] Results for external tests need to be stored under src/test/resources and mirror the folder structure (and file names) of the external tests. Speaking of which, the extensions of the external test inputs should be changed so that they match those of the standard ones: vpp tovdmpp and vdm to vdmsl.

peterwvj avatar Nov 19 '15 18:11 peterwvj

So, I've read this and also read through #474

It seems like the problems are:

  • making sure all tests are correctly run (this includes passing the external inputs and making sure you have the right examples)
  • tracking down test failures to individual models
  • updating test results
  • general confusion about how many kinds of tests we have, etc. For the record, there are 3 kinds of tests ("normal", examples and externals). We also have two test frameworks which only helps make things more confusing.

I completely agree that the current setup is too complicated. This is actually an important thing. If tests are hard to run, people will just not care and skip them. That leads to trouble. This is even more important for hypothetical new developers since they are more likely to break things.

I volunteer to overhaul the test setup. One thing I want to do is kill the old framework. I can also improve error reporting to give full file paths (or something) as well as report the update property. Standardising the properties is also probably a good idea.

However, before we do anything, I'd like to get a list of requirements down for this. So, fire away.

ldcouto avatar Nov 19 '15 18:11 ldcouto

So (cough) sorry for the earlier rant, but I think Luis is absolutely right that if we don't get the test framework easy enough for new users, it will just be ignored. And that is a huge problem.

I suppose from a "test user" point of view, what I want is for tests that fail to be clearly listed, for it to be clear where those test files (Java and VDM) are, and for it to be obvious how to change either the expected result or the test spec itself - be that automatic or manual. I would also like it to be easy to add tests, so that I am not put off the idea of doing so when I introduce new language features.

We might be limited to some extent by what the surefire plugin does, though presumably we can print out useful information at the start of all tests (can that be done for failing tests only?)

We also have to worry about what derived products need (Compass etc)? I think this was mentioned when I was confused before about Git submodules. I remain to be convinced that submodules are simple enough for general users to work with - if I don't get it immediately (and I didn't) that's probably a fail.

Separate tests (with Maven dependencies on Overture) are attractive, but the problem is keeping versions in sync when new changes are made to the language. But if it brings great simplicity, that might be a good trade.

Sorry this is all a bit of a jumble, but I'm thinking on the fly :-)

nickbattle avatar Nov 19 '15 18:11 nickbattle

Good that we have the discussion going :)

I agree that for test failures the framework should (at least) report the full path for the test input and the full path for the expected result as well as the properties required to update result files

We should, however, also consider the situation where changes to the result files are substantial, and perhaps, affect hundreds of files covering multiple tests. If the test framework reports only the properties of the individual tests then it becomes a tedious task to update the test results. I've experienced something like this before.

Can we somehow make it easy to update all the result files for a single project or maybe the entire core? How should the test framework handle this situation? How should it report the properties?

peterwvj avatar Nov 19 '15 19:11 peterwvj

Concerning the "separate tests" issue. Should this repo only test the standard examples (those that bundle with the tool) and not the SCSK tests? We could also simplify testing of the standard examples to only check that the they type check without errors, do not crash for PO generation etc. to avoid recording detailed information about each test (like you once suggested, Luis). I guess the important thing is that the examples don't crash the tool.

Do we keep the SCSK tests as they are? Maybe this is a reasonable solution if we update the test framework to work better.

peterwvj avatar Nov 19 '15 20:11 peterwvj

We have here tried to summarize the issue above into two categories:

  • [ ] Make sure that the wiki properly describes the current features of the framework
  • [ ] Improve test framework
    • [ ] Unify the properties used for result override for all projects. Idea would be to use a folder specific pattern. For the type checker this could be: -Dtests.core.interpreter.external.classespp.impl.pp.type.type-01 and then have the option to use -Dtests.core.interpreter.external.classespp.impl.pp.type.* to override all tests in the test group or -Dtests.core.interpreter.* to override all interpreter tests. (Or any sub group in between)
    • [x] Use ordered collections (lists) to store warnings/errors to avoid unnecessary re-ordering of erros/warnings when updating test results
    • [ ] Improve the test framework such that any failing test prints out:
      • [ ] Source info if the test originates from a file
      • [ ] The result file path
      • [ ] The property that must be set to override the test result
  • Improve test origination
    • Shall test input files (vdm test specifications) always be stored together with the expected result in the same repo? (PVJ & KEL suggest NO)
    • Shall we change the testing of the examples we bundle with overture to simply check that they type check correctly, do not crash on PO generation etc. By doing that we can avoid use of result files.
    • Shall we change the testing of the examples we bundle with overture be moved to a examples repository together with the results files and only be tested before a release by the release manager. This will be a completely separate maven module that depends on org.overturetool.core:interpreter which resides outside the overturetool repository?
    • Shall we keep the external property for the SCSK tests? (PVJ & KEL suggest YES)

We think that the Improve test framework item described above can be implemented as described and will address the issues related with the framework itself. However, the item Improve test origination needs further discussion so please provide your opinion.

//Cheers Kenneth and Peter J

lausdahl avatar Nov 23 '15 12:11 lausdahl

I like the idea of the simple hierarchical property names to control the automatic result re-generation. That is simple an intuitive (though it would still be useful to print it out when a test fails!).

One thing that isn't explicitly mentioned above is that we need a simple way for people to be able to add new tests and results. Perhaps the answer is that we already just search folders for filenames of particular patterns and treat these as specifications and the corresponding results? If so that's good, though there are perhaps different conventions for syntax, TC or runtime errors, and success result files? And we don't want a test hierarchy that's so "scattered" that it is not obvious where a new test file/result would go.

While we're in the area, can we change the XML file format to be treated as an ordered list rather than a set of errors? That way, rebuilding results that only re-order the file won't show up in commits.

Keeping test results apart from the tests themselves is tricky. It allows us to have multiple versions of Overture that treat the same tests differently (and each one is "right"). But you can also see that it introduces a coupling between Overture and its (separate) test sources that isn't ideal. It may be that different versions of Overture want to change the test sources as well as the results. But separating them comlpetely would require branches in the test repo that match those in the Overture repo (eg. the ncb/development branch would have an associated ncb/development branch in the test repo). That pattern can cope with arbitrary changes, but people would have to be careful when the were merging other people's branches. It's almost an argument for keeping tests and results all under the Overture repo. Not sure what's best here. You didn't list the reasons for your suggestion to separate them above?

If we say that the "examples" are supposed to be working examples of specifications for users to look at, then it seems reasonable that they should all be "OK" tests with a result. Why would a user want to see a spec that has errors in it? So they could be "source and results together". That would simplify the tests and move them out of the picture - though release time would involve some work if they weren't checked occasionally on the interim builds.

Did the last point mean "the external repository"? Unless SCSK or Kyushu or whoever owns these tests now can give us a waiver to use them and distribute them, surely we are obliged to keep them separate? Given that, I don't think we can include the results files for them with Overture either. So maybe this is also "source and results together"?

nickbattle avatar Nov 23 '15 13:11 nickbattle

Good point about using ordered collections to store warnings/errors. I've edited Kenneth's comment to include it as an improvement (yes you can do that on Github).

The way you currently add new tests mostly follow maven and JUnit4 practices. For example, all test go into the src/test/java, test resources go into the src/test/resources and so on. Luis also put up some wiki pages describing how to use the testing framework. It's a bit unclear to me how we should improve this part of the framework. Also, can you elaborate on what you mean when you say that the test hierarchy is "scattered"?

Regarding testing of the standard examples (those that bundle with the tool). What we mean is to change those tests to only check if the examples type check correctly, do not crash on PO generation etc. If we do that we can completely get rid of the .result files for the standard examples. This will simplify maintenance and testing of the standard examples a lot and at least we can check that the examples don't crash the tool.

I guess what we are trying to say with regards to the SCSK item is that we want to keep it as it is. Right now we store the SCSK result files in the Overture repo. Ideally we would want to keep the test input files there as well, but as you have already pointed out, we are not allowed to do that. By including the result files in the overture repository at least we make sure that everybody has the most recent result files. The test input files rarely change so that's probably why we haven't experienced too many problems keeping them in a separate repo.

peterwvj avatar Nov 23 '15 15:11 peterwvj

I haven't checked this, and it may be my misunderstanding, but are you saying that to add a new test (that is just "process this spec and see what you get"), you always have to amend Java test files? I was hoping that in some (many?) cases, the test framework would just "run" everything under a certain location called "xxx.vdmsl" and expect the results that are in "xxx.results". So to add a new test to that part of the test suite, I can just create two (data) files, not amend the Java classes.

if that sort of test handling is possible, it would make it easy for newcomers?

The feeling of it being "scattered" is related to the above. If I wanted to create a new spec and results file for (say) some pure operations, where would I put them? (Or if I have to amend the Java, which class would I add to, or where would I create a new class?). I think that the test directory hierarchy is quite complicated (because we have a lot of tests), hence the tests are scattered around - though not illogically so. But when a hierarchy gets beyond a certain size, people who don't have the "shape" in their heads are likely to add things in the "wrong" place. I'm thinking of new users again. But there's not much we can do about that.

Does it really add much to the examples suite to check that a run of the spec produces a given result? I suppose it does simplify management - though how would we know if one of the examples had been broken by a release and no longer produced the result that it used to (or crashes)?

When I was originally working with VDMJ and the CSK tests (as they were then), I simply had the JUnit tests for the suite in a separate Eclipse project (nothing to do with VDMJ apart from the dependency on its jar). Then at regular intervals, I could just run the Eclipse JUnit tool and see all 3200-odd tests work and get a "green line" (which took about 10 seconds, amazingly). When things broke, the JUnit tool gave me a list of what no longer worked. And if I (manually) determined that they were all expected changes, I could re-generate the results by setting a flag and running the tests. So that was easy, convenient, and very efficient, and I could easily maintain the results. If we could get back to something like that I'd be very happy.

nickbattle avatar Nov 23 '15 16:11 nickbattle

Some of the stuff suggested here, such as using partial properties to update all tests or running tests on all sources in a folder, is already implemented. I'm sad we have done a poor job of explaining these features.

I think we should write up a guide for using the test framework before we actually implement the new features. That way, we can use it as a specification of sorts.

We seem to be converging to a solution but there still seems to be a lot of open issues with organizing the test inuts and results files.

For the record, I think most inputs should be in the src/test/resources of whatever module they are used for (TC, Interpreter, etc.). The results should be stored together with the inputs.

The example tests seem to be the ones causing the most trouble. I suggest we just take them out of the build and make a separate project for running these tests and have the build server take care of that. We can test every push of development or something. It should also be possible for individual developers to run these tests on their machines with local builds. There are some issues with this approach so maybe we should discuss it separately.

The external tests we could also move to a separate project or keep them as they are, using the maven property. That raises the question of where to store the results for these tests.

ldcouto avatar Nov 23 '15 16:11 ldcouto

@nickbattle

For most situations I think the tests are set up in the way you describe. In the type checker, interpreter, PO and code generator the input and result files are stored in the same location. The result files are (mostly) stored in JSON format which makes it a bit difficult to write them up by hand. So normally I just generate the result file and check that it's correct (it's still human-readable). I can do that without changing any Java files - I just pass a maven property to generate the result file.

How much it adds to the examples suite to check that a run of the spec produces a given results clearly depends on how much data you store. For large examples that generate hundreds of POs it may become difficult inspect the output manually when it changes (some of this might be because of the re-ordering issue that we currently have). For the interpreter tests it may not add much if all you are interested in storing is the expected result of evaluating a model.

We used to set a property to update the .result files. The problem with this approach is that people sometimes forget to unset the flag before they commit (yes it really has happened). I guess this is why Kenneth changed it use properties instead. Personally I thinks it's better to pass properties although I agree that it's very confusing at the moment. If you prefer, you can also pass them from the launch configuration in Eclipse as "VM arguments".

peterwvj avatar Nov 24 '15 22:11 peterwvj

Ok, this thread is in risk of being abandoned. Do not want that!

So, after much thinking, I've come around on the examples tests. They should be moved out of the build. We can set up an integration test project for them afterwards if that's relevant. The examples should only be tested in preparation for a release. That way, we update the examples, if needed, and ship them as part of the new release.

That leaves the externals tests. Do people still want them in the main build (triggered via Maven property) as has been the case so far?

--ldc

ldcouto avatar Dec 06 '15 14:12 ldcouto

I've also gone through the tasks list and moved it to the issue root. It's a better place. I've removed and changed a few things from there.

  • update property: The idea of breaking up the test updates by dialect, type, etc. is nice but really that's a big pain for test writing and forces us to track lots of different properties. A simply system that ALL tests follow is better, IMO
  • using collections: this is way too restrictive. Not every module produces results that make sense in collections (ex: codegen). On the contrary, the test framework should place no restrictions on the type of results: only that it be serialisable. Speaking of which, I would like us to reach a decision on XML vs JSON. My vote goes for JSON since it's easier and faster to implement serialisation.

ldcouto avatar Dec 06 '15 14:12 ldcouto

I agree - let's move the examples tests out of the tool repo. Also, I'm happy with testing them very superficially and only make sure they don't crash the tool.

Regarding the external tests (the SCSK tests) ...

Our own tests are not very good so I'm in favour of keeping the SCSK ones. It's a bit annoying that we have keep the test input files in the AU GitLab repo, but until VDMTools goes open-source I guess there is no way around this. When VDMTools goes open-source I guess we can move the test input files to the tool repo so that we have them stored together with the result files.

I'm happy with using JSON. I always start out by generating the result files anyway.

peterwvj avatar Dec 06 '15 15:12 peterwvj

From the NM today, it sounds like it may still take a while before they go open source. We probably shouldn't wait for it. So, for now we use the GitLab thing, as you say.

On 6 December 2015 at 16:46, Peter W. V. Tran-Jørgensen < [email protected]> wrote:

I agree - let's move the examples test out of the tool. Also, I'm happy with testing them very superficially and only make sure they don't crash the tool.

Regarding the external tests (the SCSK tests) ...

Our own tests are not very good so I'm in favour of keeping the SCSK ones. It's a bit annoying that we have keep the test input files in the AU GitLab repo, but until VDMTools goes open-source I guess there is no way around this. When VDMTools goes open-source I guess we can move the test input files to the tool repo so that we have them stored together with the result files.

I'm happy with using JSON. I always start out by generating the result files anyway.

— Reply to this email directly or view it on GitHub https://github.com/overturetool/overture/issues/483#issuecomment-162324525 .

ldcouto avatar Dec 06 '15 16:12 ldcouto

One thing about the SCSK test suite. Even if they release the tests under an open source licence, we cannot directly use their test suite because it is tailored to run with VDMTools (not surprisingly). When I first got the suite from PGL, I spent a lot of time converting them into a form where I could run them as JUnit tests - lots of sed scripting and moving things about. That's not to say that a common language test suite wouldn't be a good thing - it would be a very good thing! - but we can't just pick it up and use it, if they release the source.

On the other hand, if they do release the source (which we copied) then presumably it is easier for us to handle the tests in the sense that we don't need to keep them isolated from other GPL components.

nickbattle avatar Dec 07 '15 09:12 nickbattle

Yep. GPL allows you to modify. So we can just use your modified suite, I hope :).

I agree that the dream scenario would be the inputs as pure vdm text files and some easily parsed specification of the result. But this is probably not something we can maintain. I'm fine with using the SCSK inputs as regression tests.

On 7 December 2015 at 10:54, Nick Battle [email protected] wrote:

One thing about the SCSK test suite. Even if they release the tests under an open source licence, we cannot directly use their test suite because it is tailored to run with VDMTools (not surprisingly). When I first got the suite from PGL, I spent a lot of time converting them into a form where I could run them as JUnit tests - lots of sed scripting and moving things about. That's not to say that a common language test suite wouldn't be a good thing - it would be a very good thing! - but we can't just pick it up and use it, if they release the source.

On the other hand, if they do release the source (which we copied) then presumably it is easier for us to handle the tests in the sense that we don't need to keep them isolated from other GPL components.

— Reply to this email directly or view it on GitHub https://github.com/overturetool/overture/issues/483#issuecomment-162467326 .

ldcouto avatar Dec 07 '15 14:12 ldcouto

Hi guys,

I am implementing some of these new features and I'd like opinions on something.

When reporting test failures, we want to inform the "user" of input and result paths and update property. These are not a problem.

However, notification of of test failures is made in the test assertions. At the moment, assertions are left entirely to the test writer since different modules will want to assert different things. So, everyone has to remember to call the test info method in their messages. I have Javadoc reminding people to do this but it did not seem to help any for the update property so that might not be the way to go.

I can change the framework so that the assertion and message are handled automatically and test developers simply implement the comparison method (see below). I can also provide a hook for developers to customize the test failure message (otherwise it will just be Expected expected.toString(). Got actual.toString().

public bool compareResults(R actual, R expected){
// add complex results comparison here
}

However, unexpected behavior such as null pointers inside the test will cause errors that do not report this information. I think we have to accept this. The only way I see around it is to catch all exceptions and chain it with some custom exception carrying the test info.

Thoughts?

ldcouto avatar May 22 '16 12:05 ldcouto

Great Luis!

I can change the framework so that the assertion and message are handled automatically and test developers simply implement the comparison method (see below). I can also provide a hook for developers to customize the test failure message (otherwise it will just be Expected expected.toString(). Got actual.toString().

Assuming I understand you, I think this sounds reasonable. I support your proposal about having a hook, which I assume is a method you must override to achieve custom behavior. Generally I think it's useful to provide a default behavior that can be overridden, if desired.

However, unexpected behavior such as null pointers inside the test will cause errors that do not report this information. I think we have to accept this. The only way I see around it is to catch all exceptions and chain it with some custom exception carrying the test info.

For "internal errors", I think that's fine.

peterwvj avatar May 22 '16 14:05 peterwvj

Ok, that's one decision.

There are 2 more issues though:

  • With this approach we cannot use assertEquals. This is a pity because most IDEs allow you to click an assertEquals failure and get a diff of the results.
  • Computing a message for test failures will be very similar to performing the results comparison. It would be efficient to do both things at once but I can't think of a clean way to do it. We could return a tuple with with an optional message field for false comparisons. But tuples are evil...

ldcouto avatar May 22 '16 14:05 ldcouto

Regarding 1. That's a shame - not sure I fully understand why though. But I guess you can just apply the diff tools yourself then. I don't think it's a show-stopper.

Regarding 2. Is it a problem to keep it separate? I agree that a generic tuple class kind of sucks, so maybe it is better to use a dedicated results container to store the information. What do you think about that?

peterwvj avatar May 22 '16 15:05 peterwvj

  1. If we have a custom comparison method that method needs to be wrapped in an assertTrue. When this method fails, there is no notion of expected or actual to diff.
  2. The problem with keeping them separate is that you probably have to write the same code twice. I think that's a deal breaker. I suppose a container could work but that's more stuff for people to learn.

How about we just provide a default result comparison, based on assertEquals? And we provide the option to completely override the comparison?

ldcouto avatar May 22 '16 15:05 ldcouto

Thanks for explaining.

Fair point. I think what you are suggesting sounds like a reasonable solution.

peterwvj avatar May 22 '16 17:05 peterwvj

@ldcouto Is it not possible to provide 2 methods one with assertEquals and one with assertTrue? Like:

public bool compareResults(R actual, R expected){
// add complex results comparison here
}

and

public bool compareEqualResult(R actual, R expected){
// add complex results comparison here
}

gkanos avatar May 23 '16 07:05 gkanos

We could but we'd have to work out the work flow for that. How would you suggest we do it?

At the moment the lone comparison method gets called automatically during test execution. On 23 May 2016 08:16, "George Kanakis" [email protected] wrote:

@ldcouto https://github.com/ldcouto Is it not possible to provide 2 methods one with assertEquals and with assertTrue? Like:

public bool compareResults(R actual, R expected){ // add complex results comparison here }

and

public bool compareEqualResult(R actual, R expected){ // add complex results comparison here }

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/overturetool/overture/issues/483#issuecomment-220904001

ldcouto avatar May 23 '16 11:05 ldcouto

Right guys. I have updated the framework to implement the features we discussed (branch ldc/ntf. Should be safe to merge. The changes are backwards-compatible).

I've also begun writing the documentation for it (see https://github.com/overturetool/overture/wiki/Testing).

We still need to convert some of the existing tests out of the old. Likewise, tests already implemented in the new framework need to be modified so they invoke testInfo() in their failures. Without this, the update information won't be printed.

I have begin doing this with the POG tests. Once I'm done with those, I'll update the documentation on how to write new tests.

ldcouto avatar May 23 '16 20:05 ldcouto

Sounds good, Luis. I look forward to seeing the doc. I would like to try it on some of the codegen tests.

peterwvj avatar May 23 '16 23:05 peterwvj

@ldcouto I will see what changes have you made and I will read the doc. My initial thought is that the user will write the comparison using the method which calls the assertion he wants. When writing the comparison for the AssertEqualResult the assertEqual will be called else the assertTrue in the other case.

gkanos avatar May 24 '16 06:05 gkanos

Right, the thing I can't figure out is how will the framework tell which one the user has overridden.

Plus, I want something that's conceptually simple and I can't find a good way to explain the two assert version, even if I could implement it.

ldcouto avatar May 25 '16 19:05 ldcouto

Ok, the wiki page is updated. There is some advanced stuff TBD but let's wait until people use it and see if it's useful.

You can check out the new improvements by merging with ldc/ntf. Try to edit one of the sample result files to see the new failure messages.

The next step is to migrate all the modules to the new framework. I would appreciate some help with that.

ldcouto avatar Jun 02 '16 21:06 ldcouto

Just letting you know that I merged ldc/ntf into pvj/main

peterwvj avatar Jun 16 '16 16:06 peterwvj