codefuse-evaluation icon indicating copy to clipboard operation
codefuse-evaluation copied to clipboard

Inquiry about Evaluating Code Assistant Products with CodeFuse-AI Evaluation Benchmarks

Open theshyPika opened this issue 1 year ago • 1 comments

Hi, CodeFuse-AI team,

I am interested in evaluating several code assistant products. However, I do not possess a large-scale code model of my own. Instead, what I have are the responses these code assistants provide to various prompts.

My question is, would it be possible to evaluate these code assistants by creating a dataset from their responses, even in the absence of a large-scale code model? If this is not feasible, could you kindly suggest any alternative approaches?

In the event that this is possible, are there any additional considerations I should be aware of when creating this dataset, apart from the requirements mentioned in the Readme?

I appreciate your guidance and look forward to your response.

Best regards, ck.

image

theshyPika avatar Apr 16 '24 09:04 theshyPika

Hi,ck

you can use the datasets provided in this repository to evaluate code assistant products by comparing their results to understand if there is an improvement in coding capabilities over the models that the assistant products rely on. Additionally, depending on the target application scenarios of the assistant products, you can expand the evaluation datasets and metrics within this framework.

Best regards

HotSummer888 avatar May 08 '24 09:05 HotSummer888