CodeTF icon indicating copy to clipboard operation
CodeTF copied to clipboard

`defect` and `refine` tasks for `codet5p-16b`

Open cyw3 opened this issue 1 year ago • 4 comments

Hi Nice project!

Do you have the checkpoint and evaluation conclusions of the defect and refine tasks for codet5p-16b?

Here I tested the codet5 defect and refine task examples in codetf, and found that the effect is not very good.

cyw3 avatar May 25 '23 07:05 cyw3

hi, thanks for the question. May i know what do you mean by the "effect"? Note that we haven't concluded the project and haven't officially released yet, there are still things to improve.

bdqnghi avatar May 25 '23 08:05 bdqnghi

  • For Salesforce/codet5-base-codexglue-defect:
code_snippets = """
#include <stdio.h>
int main()
{
    int *pointer_var;
    printf("Address of the given pointer variable: %d", pointer_var);
    printf("Value of pointer variable is : %d", * pointer_var);
    return 0;
}
"""
defect_model = load_model_pipeline(model_name="codet5", task="defect",
            model_type="base", is_eval=True,
            load_in_8bit=True, weight_sharding=False)
defects = defect_model.predict([code_snippets])
print(defects)

code_snippets : a c/c++ demo about null pointer error. But Salesforce/codet5-base-codexglue-defect thinks it's not a defect, and output ['false'].


  • For Salesforce/codet5-base-codexglue-refine-medium:
code_snippets = """
class GFG
{
    public void test ()
    {
        // Initializing String variable with null value
        String ptr = null;

        // This line of code throws NullPointerException
        // because ptr is null
        if (ptr.equals("gfg"))
            System.out.print("Same");
        else
            System.out.print("Not Same");
    }
}
"""
refine_model = load_model_pipeline(model_name="codet5", task="refine",
            model_type="base", is_eval=True,
            load_in_8bit=True, weight_sharding=False)
refines = refine_model.predict([code_snippets])
print(refines)

code_snippets : a java demo about null pointer error. But Salesforce/codet5-base-codexglue-refine-medium output the fixed version with more defects:

public void test ( ) {
    // Initializing String variable with null value
    String ptr = null ;
    // This line of code throws NullPointerException because ptr is null
    if ( ptr. equals ( "gfg" ) )
        java.lang.System.out.print ( "Same" ) ;
    else
        java.lang.System.out.print ( "Not Same" ) ;
}
  • defect 1: ptr.equals will have null pointer error.
  • defect2: java.lang.System.out.print

cyw3 avatar May 25 '23 08:05 cyw3

i see, thanks for this, the CodeT5-defect is fine-tuned on this data from Microsoft CodeXGLUE: https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection. In fact, this dataset is mostly used for academic research, apply the results into real-world application can produce unsatisfied result for many cases.

You can check more on the results in CodeT5 paper: https://aclanthology.org/2021.emnlp-main.685.pdf.

bdqnghi avatar May 25 '23 08:05 bdqnghi

Thanks.

Do you have more datasets about defect and refine to recommend?

cyw3 avatar May 25 '23 10:05 cyw3