docprompting train_retriever_sup

Could you share the code for generating the training data for simcse? What is the difference between text1 and text2 in train_retriever_sup_unsup.json?

Apr 26 '24 16:04 chenzhongwu

I am very curious about how you generate the files in the provided dataset in https://drive.google.com/file/d/1CzNlo8-e4XqrgAME5zHEWEKIQMPga0xl/view?usp=sharing. What methods did you use to process them from what raw datasets? Thanks!

Apr 28 '24 10:04 chenzhongwu

What is the difference between text1 and text2 in train_retriever_sup_unsup.json?

In the unsupervised setting, text1 and text2 are the same. In the supervised setting, text2 is the natural language intent from CoNaLa and text1 is the description of the function that fulfill the intent.

What methods did you use to process them from what raw datasets?

CoNaLa provides NL-code pairs. We use heuristics to extract the functions from the code and find their responding documents. Please see Appendix B of the paper for more detailed descriptions.

Let me know if you want to use similar pipelines to generate more NL-doc-code tuples. I can provide some more straightforward approach to generate the data.

May 06 '24 15:05 shuyanzhou

Thanks a lot! I have some other questions confusing me: 1. Could you explain more about Evaluation metrics: character-level BLEU? 2. What is the metric for Retrieval performance in Table 4? How to evaluate the retrieved docs are right? Thanks again!

Jun 10 '24 06:06 chenzhongwu

character-level BLEU is calculated in this way:

from sacrebleu.metrics import BLEU
bleu = BLEU(tokenize='char')
bleu_score = bleu.corpus_score(pred_list, [src_list]).score
metric_list['bleu_char'] = bleu_score

where pred_list and src_list are list[str]

The code to calculate recall is here. Basically, we select top-k from the retriever and see if the ground-truth is inside the top-k.

Jun 10 '24 14:06 shuyanzhou

docprompting docprompting copied to clipboard

train_retriever_sup_unsup.json

docprompting
docprompting copied to clipboard