repair search space
marker issue for @sedflix' work
Some thoughts on the module designs. Note: this is just to get a discussion started.
- a new mode type named?
- a new CLI argument called "repair-tool-name"
- There are two conditions:
- repair-tool-name is specified: all revisions will be classified as yer or no
- repair-tool-name is not specified: all revisions with be classified with multiclass labels
- The module
- runs a mieinstance job for each repair-tool with multiple pattern files we have made manually for each repair-tool.
- The output of the mineinstance job is passed to another filter for that specific repair-tool to get the final output. The filter will be used to check conditions which can't be checked using patterns.
- the output could be similar to that of mineinstance with an extra field called probable-repair-tools
What do you think? @martinezmatias @monperrus
LGTM.
The name of the module / CLI can be repairability-analysis or analyze-repairability
Hi @sedflix Perfect.
runs a mieinstance job for each repair-tool with multiple pattern files we have made manually for each repair-tool. The output of the mineinstance job is passed to another filter for that specific repair-tool to get the final output. The filter will be used to check conditions which can't be checked using patterns
It makes sense. There, the challenge is to determine, as you mention, which "checks" must be included in this new filter you mention and which once can be incorporated to the "mine instance" analyzer by improving our current pattern specification.
Cool. I will try to avoid the use of "checks" and if they are required I would most probably make an issue here to discuss including such features in the pattern specification itself
#74 represents the ongoing work.
The current flow of the module is as follows:
- apply FineGrainDifftAnalyzer to the input
- extract all the patterns that need to be mined using
fr.inria.coming.repairability.RepairTools
Right now, I'm using the JSonPatternInstanceOutput. The pattern name of the instance specifies its label.
Does an output like this make sense and is it okay? repairability is an array so that a single revision can be classified into multiple tools.
{
"instances": [
{
"revision": "patch1-Chart-26-jMutRepair",
"repairability": [
{
"tool-name": "JMutRepair",
"pattern-name": "JMutRepair:unary",
"instance_detail": [
{
"pattern_action": "ANY",
"pattern_entity": {
"entity_type": "UnaryOperator",
"entity_new value": "*",
"entity_role": "*",
"entity_parent": "null"
},
"concrete_change": {
"operator": "INS",
"src_type": "UnaryOperator",
"dst_type": "null",
"src": "(!b1)",
"dst": "null",
"src_parent_type": "BinaryOperator",
"dst_parent_type": "null",
"src_parent": "(!b1) || b2",
"dst_parent": "null"
},
"file": "/test",
"line": 2538
}
]
}
]
},
{
"revision": "patch1-Chart-7-jMutRepair",
"repairability": [
{
"tool-name": "JMutRepair",
"pattern-name": "JMutRepair:binary",
"instance_detail": [
{
"pattern_action": "UPD",
"pattern_entity": {
"entity_type": "BinaryOperator",
"entity_new value": "*",
"entity_role": "*",
"entity_parent": "null"
},
"concrete_change": {
"operator": "UPD",
"src_type": "BinaryOperator",
"dst_type": "BinaryOperator",
"src": "dataset != null",
"dst": "dataset == null",
"src_parent_type": "If",
"dst_parent_type": "If",
"src_parent": "if (dataset != null) {\n return result;\n}",
"dst_parent": "if (dataset == null) {\n return result;\n}"
},
"file": "/test",
"line": 2370
}
]
}
]
}
]
}
Hi @sedflix I would say that's okey: you added to the instance detection the information that the module needs 1) the repair tool ( "tool-name": "JMutRepair") and 2) the repair applied ("pattern-name": "JMutRepair:binary")
Hi @sedflix FYI: I am implementing one change to avoid having the harcoded "file": "/test",". I am changing Gt-Spoon and Coming. PR soon.
Hi @martinezmatias,
If I'm correct IntermediateResultProcessorCallback is called after execution of all the analyzers and before the execution of output processors?
Therefore, IntermediateResultProcessorCallback will be an appropriate way to implement the filter as discussed above.
What do you think?
Hi @sedflix
If I'm correct IntermediateResultProcessorCallback is called after execution of all the analyzers and before the execution of output processors?
Yes. It's called once all analyzers are executed.
Therefore, IntermediateResultProcessorCallback will be an appropriate way to implement the filter as discussed above.
I'd say that it's not a good place to put that functionality there. A better option IMHO is to create a new Analyzer. Note that Coming creates a pipe of analyzers, where the results from an analyzer is passed forward. Thus, I would add a new analyzer, which takes the pattern detection output and refines the matching.
Cool!
FYI: I am implementing one change to avoid having the harcoded "file": "/test",". I am changing Gt-Spoon and Coming. PR soon.
Implemented and merged in both GT-Spoon and Coming. PR #78
Hey @martinezmatias and @monperrus , What do you think about how to proceed with the quantitative analysis of repairability module, in particular, the false-positives and true-negative cases?
The current dataset lets us consider only true-positives and false-negatives cases!
When you have a dataset with ground truth classification (such as DRR) we have all four cases. Correct?
See Estimating the Potential of Program Repair Search Spaces with Commit Analysis (Khashayar Etemadi, Niloofar Tarighat, Siddharth Yadav, Matias Martinez and Martin Monperrus), In Journal of Systems and Software, 2022