rank_llm icon indicating copy to clipboard operation
rank_llm copied to clipboard

Reorder

Open XKTZ opened this issue 5 months ago • 0 comments

Pull Request Checklist

Reference Issue

This is a superset of issue Top Down. This PR reorganized the various reordering methods including sliding window, top down, as well as ListT5's tournament sort methodology. Now it is allowed to use command like --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}, \"shuffle\": true, \"r\": 1}" to specify a reorderer.

ref:

Checklist Items

Before submitting your pull request, please review these items:

  • [ ] Have you followed the contributing guidelines?
  • [ ] Have you verified that there are no existing Pull Requests for the same update/change?
  • [ ] Have you updated any relevant documentation or added new tests where needed?

PR Type

What kind of change does this PR introduce?

  • [ ] Bugfix
  • [ ] Feature
  • [ ] Code style update (formatting, local variables)
  • [ ] Refactoring (no functional changes, no api changes)
  • [ ] Documentation content changes
  • [ ] Other...
    • Description:

Reproduce

Here is a small shell script helping to reproduce functionalities

DATASETS="dl19 dl20"
WINDOW_SIZE="20"

for dataset in $DATASETS; do

for window in $WINDOW_SIZE; do

if [[ $window == "20" ]]; then
  PIVOT=11
else
  PIVOT=9
fi

python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
     --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
     --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
     --variable_passages --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}}"\
     --window_size=${window}


#python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
#     --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
#     --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
#     --variable_passages --reorder_policy="sliding_window:{\"step\": 10}"\
#     --window_size=${window}

#python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
#     --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
#     --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
#     --variable_passages --reorder_policy="tournament_sort:{\"step\": 10, \"r\": 1}"\
#     --window_size=${window}

done

done

File changes

rank_fid.py, rank_gpt.py, rank_listwise_os_llm.py: They are now directly using listwise_rankllm.py's rerank_batch function.

listwise_rankllm.py: the rerank batch deprecated the original method, using ModelFunction to catch the necessary methods to doing rerank, and pass this into the reorder policies for reordering.

reorder_policy.py: various reorder policies

top_down/tournament...: the implementation of different policies

xxx_reranker: Add a parameter of reorder policy, defaultly using sliding window

rankllm.py: Let create_prompt depends on select indices, instead of range

reranker.py, run_rank_llm.py: Parameter change

XKTZ avatar Sep 15 '24 20:09 XKTZ