jflex icon indicating copy to clipboard operation
jflex copied to clipboard

Property test improvement

Open jcoultas opened this issue 3 years ago • 9 comments

This pull request increases coverage on property tests by improving generators and updating test cases. Improvements made to the generators to ensure cased characters are chosen with more frequency. Tests with caseless options have been added as a boolean with updates to allow caseless to operate correctly. For State Set updates to ensure resize code is called using the new OffsetGen generator. Also, property and additional code added to State Set to improve coverage.

I am a part of a research group at the University of Illinois Chicago that works towards novel methods to detect bugs in software projects. This project is identified as being a good candidate for application of our research method to improve coverage using property tests. The above issue outlines the findings with a pull request of the enhanced property test to document the observed behavior.

jcoultas avatar Apr 20 '22 00:04 jcoultas

Very nice to see a PR on test improvements!

I'm about to head out on vacation, so it'll be a bit before I can look at it in more detail, but I definitely appreciate the effort.

lsf37 avatar Apr 20 '22 00:04 lsf37

Hey @jcoultas, thanks for adding this, I can see what you're trying to do and I approve of the idea.

I'd make corner case values like 0, 1, MAX a little more likely, maybe up 5% even, but otherwise this looks good.

It does however fail to build at the moment, in particular at

jflex/src/test/java/jflex/state/StateSetQuickcheck.java:163: error: cannot find symbol
      @From(OffsetGen.class) int largeOffset) {

Are you planning to continue work on this?

lsf37 avatar Jun 19 '22 22:06 lsf37

@lsf37 I am interested in working on this still. I will review these comments and get back with you in a few days.

jcoultas avatar Jun 21 '22 16:06 jcoultas

@lsf37 I reviewed the build failure and the below test case is setup with seed values that produce the failure. I am not sure exactly what the issue is with this. Is this a bug? Or an issue with the updated test? Could you provide me with some insight?

In CharClassesQuickcheck...

  @Property
  public void addString(
      @When(seed = -5022752149301929905L) CharClasses classes,
      @When(seed = -4289856472919628505L) String s,
      @When(seed = 4027864125274679581L) @From(IntCharGen.class) int c,
      @When(seed = 3246250712413252630L) boolean caseless)
      throws UnicodeProperties.UnsupportedUnicodeVersionException {

    assumeTrue(s.indexOf(c) < 0);

    if (caseless) {
      classesInit(classes);
    }

    classes.makeClass(s, caseless);
    assertThat(classes.invariants()).isTrue();

    int cCode = classes.getClassCode(c);
    for (int i = 0; i < s.length(); ) {
      int ch = s.codePointAt(i);
      assertThat(classes.getClassCode(ch)).isNotEqualTo(cCode);
      i += Character.charCount(ch);
    }
  }

jcoultas avatar Jul 05 '22 16:07 jcoultas

I think this might have uncovered an actual bug, at least so far it doesn't look like anything to do with the test driver. It'd be great if we could minimise the example further to figure out what is actually going on, i.e. what the trigger is.

I've saved the test logs, so it's fine to keep changing things, we won't lose the specific test case.

lsf37 avatar Jul 05 '22 23:07 lsf37

Hm, actually, thinking more about it, it might be an issue with the test after all. We're asserting

assertThat(classes.getClassCode(ch)).isNotEqualTo(cCode);

where ch is a code point in the string. We should not be expecting the char class of all code points in the string to be distinct from cCode for caseless scanners.

In a scanner that distinguishes case, each character in the string should become a separate class and we'd want the assertion to hold. But: in a scanner that does not distinguish case, the string might be ab, and the character might be A. A doesn't occur in the string, so the assumeThat is true, but a and A will have the same class, because they are supposed to be treated as equivalent.

lsf37 avatar Jul 05 '22 23:07 lsf37

@lsf37 That makes sense. What do you think would be a good assume/assert for the caseless version of addstring? It seems that we would need write an assume that would ensure that all characters of s, do not include the class of c. But, not sure how much we would want to lean on the software under test to build up the assume.

jcoultas avatar Aug 30 '22 01:08 jcoultas

@lsf37 Just checking in to see if you have any ideas or guidance on the above.

jcoultas avatar Sep 16 '22 13:09 jcoultas

Sorry, no super bright ideas on this one. Basically, for caseless you'd need to check that if the class is equal the code point c and the code point in the string are case-equivalent. IIRC we have a bunch of Unicode utility classes for that. Of course those classes are used to compute the char classes in the first place, but you'd at least get that the char class construction as such is good (but missing out on wether the classification as case-equivalent is really correct).

lsf37 avatar Sep 20 '22 06:09 lsf37

:warning: 16 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

sonatype-lift[bot] avatar Sep 29 '22 03:09 sonatype-lift[bot]

@lsf37 I've updated one test with an assumeThat which seems to allow caseless to run. Please review and let me know of any issues that would prevent this PR from being merged.

jcoultas avatar Oct 11 '22 03:10 jcoultas

@lsf37 I actually was able to get addString to fail even with my new assumeThat. So, I am going to remove the addString updates so we have passing test cases.

jcoultas avatar Oct 11 '22 16:10 jcoultas

@lsf37 This should be ready to review again.

jcoultas avatar Oct 11 '22 16:10 jcoultas

This is looking mostly good from my side modulo the still unresolved comments (the ones back from 20 Jun). Nice to see the build passing!

lsf37 avatar Oct 23 '22 22:10 lsf37

Hi @lsf37 - I think that I have addressed the above items. Please review and let me know of any noticed issues.

jcoultas avatar Nov 07 '22 00:11 jcoultas

Hi @lsf37 - Just want to check in for feedback on this PR.

jcoultas avatar Nov 22 '22 22:11 jcoultas

Sorry, things have been busy lately. I'm happy with the StateSet interface and the new changes. What about the comment on assertThat(setPre.contains(set)).isFalse();?

When that is addressed, I think we are ready to merge.

lsf37 avatar Nov 22 '22 23:11 lsf37

Looks all good now, this has come together really nicely. Thanks again for your contribution and patience.

lsf37 avatar Dec 04 '22 09:12 lsf37