defects4j
defects4j copied to clipboard
Simplify check out of non-minimized buggy version
Currently, Defects4J's version ids have the following format: <id>(f|b)
, where b refers to the minimized buggy and f to the fixed version.
We should change the version id format to: <id>(f|b|b-min|b-orig)
- b-min is minimized buggy
- b-orig is non-minimized buggy
- b is an alias for b-min (current behavior)
To be consistent, we could add the same suffixes for the (f)ixed version -- f-min would not be supported at this time.
@mernst, @Greg4cr, @jose, any thoughts?
This is consistent with what I said in the email thread. I'm in favor of this change.
@jose, you implemented some form of this in the "garbage-in, garbage-out" paper, right?
This sounds great. Thanks for the suggestion.
Agree with this. @rjust, do you want to make it obvious? I wonder if could just define it as f|b|bm|bo|
?
And, instead of min
and orig
, should we introduce the concept pure (i.e., minimal) vs impure (i.e., non-minimal) and defined it as f|b|b-pure|b-impure|
or just f|b|bp|bi|
?
@jose, you implemented some form of this in the "garbage-in, garbage-out" paper, right?
Yep. Once we all agree on the suffixes to use I would be happy to implement it.
I am inclined to be more verbose to avoid confusion and mistakes.
I don't feel strongly about b-min
vs. b-pure
. I'd like to keep b-orig
, though, to indicate that this is the original buggy version without any modifications to the source code.
Since we are already making b
effectively an alias for backward-compatibility, adding bo
as an alias for b-orig
(and bm
as an alias for b-min
for consistency) might be acceptable?
I'm fine with shorter aliases, but we should definitely have the longer forms for clarity.
After some more thought, I'd propose b-orig
and b-min
. The minimized form is always pure, but the original isn't always impure, so the pure vs impure terminology may not be universally accurate.
I agree that "min" (for "minimized") is clearer than "pure" and I prefer it. Greg's point that not all orig are impure is an even better argument.
I would be slightly inclined to omit the short aliases bo
and bm
just because shorter versions are more obscure, it's easier to make typos, it's easier to overlook typos, and the documentation has to be longer to explain it. I don't feel extremely strongly about this, though.