AllenShow

Results 9 issues of AllenShow

Hi! Thanks for your great work. I tried to reproduce the baseline tasks, but the results were low compared to the paper. So I am not sure whether I used...

Hi! Thanks for the great work. When reproducing the inference for PopQA using Self-RAG, I got the same score for adaptive_retrieval and always_retrieve. In theory, the adaptive_retrieval result should be...

Hello, I'm trying to reproduce paper numbers on PopQA by running the following command : Question Answering python run_short_form.py \ --model_name selfrag/selfrag_llama2_7b \ --input_file eval_data/popqa_longtail_w_gs.jsonl \ --mode MODE --max_new_tokens 100...

Hi! Thanks for your great work. I try to reproduce the baseline for ASQA using Llama-2-7b-hf, like this: python run_baseline_lm.py \ --model_name meta-llama/Llama-2-7b-hf \ --input_file eval_data/asqa_eval_gtr_top100.json \ --max_new_tokens 300 --metric...

Hi! Thanks for the great work. I find an error in run_short_form.py when testing with my own data about multiple choices ``` if len(results) == 1: postprocessed_pred = postprocess_answer_option_conditioned(pred) return...

Hi! Thanks for your great work. I find the task parameter in all example codes(run_baseline_lm.py) is 'qa', but in self-RAG, the task could be different values, such as arc_c, fever,...

Hello, thanks for providing this amazing tool. Could mergoo support QWEN models?

Thanks for providing the wonderful tool TRL. I have a question. When packing=True, eval_packing=False, training failed with KeyError: 'eval_loss' . However, when I removed eval_packing=False( at this moment, eval_packing will...