rank_llm
rank_llm copied to clipboard
Reorder
Pull Request Checklist
Reference Issue
This is a superset of issue Top Down. This PR reorganized the various reordering methods including sliding window, top down, as well as ListT5's tournament sort methodology. Now it is allowed to use command like --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}, \"shuffle\": true, \"r\": 1}"
to specify a reorderer.
ref:
Checklist Items
Before submitting your pull request, please review these items:
- [ ] Have you followed the contributing guidelines?
- [ ] Have you verified that there are no existing Pull Requests for the same update/change?
- [ ] Have you updated any relevant documentation or added new tests where needed?
PR Type
What kind of change does this PR introduce?
- [ ] Bugfix
- [ ] Feature
- [ ] Code style update (formatting, local variables)
- [ ] Refactoring (no functional changes, no api changes)
- [ ] Documentation content changes
- [ ] Other...
- Description:
Reproduce
Here is a small shell script helping to reproduce functionalities
DATASETS="dl19 dl20"
WINDOW_SIZE="20"
for dataset in $DATASETS; do
for window in $WINDOW_SIZE; do
if [[ $window == "20" ]]; then
PIVOT=11
else
PIVOT=9
fi
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
--top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
--prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
--variable_passages --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}}"\
--window_size=${window}
#python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
# --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
# --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
# --variable_passages --reorder_policy="sliding_window:{\"step\": 10}"\
# --window_size=${window}
#python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
# --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
# --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
# --variable_passages --reorder_policy="tournament_sort:{\"step\": 10, \"r\": 1}"\
# --window_size=${window}
done
done
File changes
rank_fid.py
, rank_gpt.py
, rank_listwise_os_llm.py
: They are now directly using listwise_rankllm.py
's rerank_batch function.
listwise_rankllm.py
: the rerank batch deprecated the original method, using ModelFunction
to catch the necessary methods to doing rerank, and pass this into the reorder policies for reordering.
reorder_policy.py
: various reorder policies
top_down/tournament...
: the implementation of different policies
xxx_reranker
: Add a parameter of reorder policy, defaultly using sliding window
rankllm.py
: Let create_prompt depends on select indices, instead of range
reranker.py
, run_rank_llm.py
: Parameter change