AllenShow
AllenShow
Hi! Thanks for your great work. I tried to reproduce the baseline tasks, but the results were low compared to the paper. So I am not sure whether I used...
Hi! Thanks for the great work. When reproducing the inference for PopQA using Self-RAG, I got the same score for adaptive_retrieval and always_retrieve. In theory, the adaptive_retrieval result should be...
Hello, I'm trying to reproduce paper numbers on PopQA by running the following command : Question Answering python run_short_form.py \ --model_name selfrag/selfrag_llama2_7b \ --input_file eval_data/popqa_longtail_w_gs.jsonl \ --mode MODE --max_new_tokens 100...
Hi! Thanks for your great work. I try to reproduce the baseline for ASQA using Llama-2-7b-hf, like this: python run_baseline_lm.py \ --model_name meta-llama/Llama-2-7b-hf \ --input_file eval_data/asqa_eval_gtr_top100.json \ --max_new_tokens 300 --metric...
Hi! Thanks for the great work. I find an error in run_short_form.py when testing with my own data about multiple choices ``` if len(results) == 1: postprocessed_pred = postprocess_answer_option_conditioned(pred) return...
Hi! Thanks for your great work. I find the task parameter in all example codes(run_baseline_lm.py) is 'qa', but in self-RAG, the task could be different values, such as arc_c, fever,...
Hello, thanks for providing this amazing tool. Could mergoo support QWEN models?
Thanks for providing the wonderful tool TRL. I have a question. When packing=True, eval_packing=False, training failed with KeyError: 'eval_loss' . However, when I removed eval_packing=False( at this moment, eval_packing will...