jflex
jflex copied to clipboard
Property test improvement
This pull request increases coverage on property tests by improving generators and updating test cases. Improvements made to the generators to ensure cased characters are chosen with more frequency. Tests with caseless options have been added as a boolean with updates to allow caseless to operate correctly. For State Set updates to ensure resize code is called using the new OffsetGen generator. Also, property and additional code added to State Set to improve coverage.
I am a part of a research group at the University of Illinois Chicago that works towards novel methods to detect bugs in software projects. This project is identified as being a good candidate for application of our research method to improve coverage using property tests. The above issue outlines the findings with a pull request of the enhanced property test to document the observed behavior.
Very nice to see a PR on test improvements!
I'm about to head out on vacation, so it'll be a bit before I can look at it in more detail, but I definitely appreciate the effort.
Hey @jcoultas, thanks for adding this, I can see what you're trying to do and I approve of the idea.
I'd make corner case values like 0, 1, MAX a little more likely, maybe up 5% even, but otherwise this looks good.
It does however fail to build at the moment, in particular at
jflex/src/test/java/jflex/state/StateSetQuickcheck.java:163: error: cannot find symbol
@From(OffsetGen.class) int largeOffset) {
Are you planning to continue work on this?
@lsf37 I am interested in working on this still. I will review these comments and get back with you in a few days.
@lsf37 I reviewed the build failure and the below test case is setup with seed values that produce the failure. I am not sure exactly what the issue is with this. Is this a bug? Or an issue with the updated test? Could you provide me with some insight?
In CharClassesQuickcheck...
@Property
public void addString(
@When(seed = -5022752149301929905L) CharClasses classes,
@When(seed = -4289856472919628505L) String s,
@When(seed = 4027864125274679581L) @From(IntCharGen.class) int c,
@When(seed = 3246250712413252630L) boolean caseless)
throws UnicodeProperties.UnsupportedUnicodeVersionException {
assumeTrue(s.indexOf(c) < 0);
if (caseless) {
classesInit(classes);
}
classes.makeClass(s, caseless);
assertThat(classes.invariants()).isTrue();
int cCode = classes.getClassCode(c);
for (int i = 0; i < s.length(); ) {
int ch = s.codePointAt(i);
assertThat(classes.getClassCode(ch)).isNotEqualTo(cCode);
i += Character.charCount(ch);
}
}
I think this might have uncovered an actual bug, at least so far it doesn't look like anything to do with the test driver. It'd be great if we could minimise the example further to figure out what is actually going on, i.e. what the trigger is.
I've saved the test logs, so it's fine to keep changing things, we won't lose the specific test case.
Hm, actually, thinking more about it, it might be an issue with the test after all. We're asserting
assertThat(classes.getClassCode(ch)).isNotEqualTo(cCode);
where ch is a code point in the string. We should not be expecting the char class of all code points in the string to be distinct from cCode for caseless scanners.
In a scanner that distinguishes case, each character in the string should become a separate class and we'd want the assertion to hold. But: in a scanner that does not distinguish case, the string might be ab, and the character might be A. A doesn't occur in the string, so the assumeThat is true, but a and A will have the same class, because they are supposed to be treated as equivalent.
@lsf37 That makes sense. What do you think would be a good assume/assert for the caseless version of addstring? It seems that we would need write an assume that would ensure that all characters of s, do not include the class of c. But, not sure how much we would want to lean on the software under test to build up the assume.
@lsf37 Just checking in to see if you have any ideas or guidance on the above.
Sorry, no super bright ideas on this one. Basically, for caseless you'd need to check that if the class is equal the code point c and the code point in the string are case-equivalent. IIRC we have a bunch of Unicode utility classes for that. Of course those classes are used to compute the char classes in the first place, but you'd at least get that the char class construction as such is good (but missing out on wether the classification as case-equivalent is really correct).
:warning: 16 God Classes were detected by Lift in this project. Visit the Lift web console for more details.
@lsf37 I've updated one test with an assumeThat which seems to allow caseless to run. Please review and let me know of any issues that would prevent this PR from being merged.
@lsf37 I actually was able to get addString to fail even with my new assumeThat. So, I am going to remove the addString updates so we have passing test cases.
@lsf37 This should be ready to review again.
This is looking mostly good from my side modulo the still unresolved comments (the ones back from 20 Jun). Nice to see the build passing!
Hi @lsf37 - I think that I have addressed the above items. Please review and let me know of any noticed issues.
Hi @lsf37 - Just want to check in for feedback on this PR.
Sorry, things have been busy lately. I'm happy with the StateSet interface and the new changes. What about the comment on assertThat(setPre.contains(set)).isFalse();?
When that is addressed, I think we are ready to merge.
Looks all good now, this has come together really nicely. Thanks again for your contribution and patience.