Extremely long time taken for comparison of codes

Open jackswl opened this issue 1 year ago • 2 comments

Hi all,

Thanks for the wonderful work.

I am currently running code_bert_score to evaluate the similarity between generated code and 'correct' code. However, it just takes way too long locally. Is there a way for it to speedup (i.e. using GPU or something) on MacOS? Are you able to let me know where I can optimize the code? Is there some specific settings I have to update for the code to run faster? Thanks!

import code_bert_score
import pandas as pd

rp_values = [1]

for rp in rp_values:
    CSV_PATH = f'xxx'
    codebertdf = pd.read_csv(CSV_PATH)

    codebertdf['generated_output'] = codebertdf['generated_output'].str.strip()

    predictions = codebertdf['generated_output'].tolist()
    refs = codebertdf['actual_output'].tolist()
    
    # Calculate BERT scores
    P, R, F3, F1 = code_bert_score.score(cands=predictions, refs=refs, lang='python')
    
    # Add scores to DataFrame
    codebertdf['P'] = P
    codebertdf['R'] = R
    codebertdf['F3'] = F3
    codebertdf['F1'] = F1
    
    # Export DataFrame
    codebertdf.to_csv(f'/xxx', index=False)

Aug 30 '24 09:08 jackswl

Hi Jack, Thank you for your interest in our work!

Yes, a GPU will definitely speed this up. You can also use a Google colab with a GPU.

Best, Uri

On Fri, Aug 30, 2024 at 05:18 Jack Shi Wei Lun @.***> wrote:

Hi all,

Thanks for the wonderful work.

I am currently running code_bert_score to evaluate the similarity between generated code and 'correct' code. However, it just takes way too long locally. Is there a way for it to speedup (i.e. using GPU or something) on MacOS? Are you able to let me know where I can optimize the code? Is there some specific settings I have to update for the code to run faster? Thanks!

import code_bert_score import pandas as pd

rp_values = [1]

for rp in rp_values: CSV_PATH = f'xxx' codebertdf = pd.read_csv(CSV_PATH)
codebertdf['generated_output'] = codebertdf['generated_output'].str.strip()

predictions = codebertdf['generated_output'].tolist()
refs = codebertdf['actual_output'].tolist()

# Calculate BERT scores
P, R, F3, F1 = code_bert_score.score(cands=predictions, refs=refs, lang='python')

# Add scores to DataFrame
codebertdf['P'] = P
codebertdf['R'] = R
codebertdf['F3'] = F3
codebertdf['F1'] = F1

# Export DataFrame
codebertdf.to_csv(f'/xxx', index=False)
— Reply to this email directly, view it on GitHub https://github.com/neulab/code-bert-score/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMEW2TTSWEX6UHZPGLLZUA2EVAVCNFSM6AAAAABNMCVKSOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TMNRYGYZTGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Aug 30 '24 11:08 urialon