question about how to use topp sampling?
when trying task gigaword, i have the bug below:
UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
unfin_idx = bbsz_idx // beam_size
../aten/src/ATen/native/cuda/MultinomialKernel.cu:214: sampleMultinomialOnce: block: [4,0,0], thread: [0,0,0] Assertion sum > accZero failed.
my code:
python3 -m torch.distributed.launch --nproc_per_node=${GPUS_PER_NODE} --master_port=${MASTER_PORT} ../../evaluate.py
${data}
--path=${path}
--user-dir=${user_dir}
--bpe=bert
--task=gigaword
--batch-size=16
--log-format=simple --log-interval=10
--seed=7
--gen-subset=${split}
--results-path=${result_path}
--sampling
--sampling-topk 10
--sampling-topp 0.7
--beam=6
--lenpen=0.7
--max-len-b=32
--no-repeat-ngram-size=3
--fp16
--num-workers=0
--model-overrides="{"data":"${data}","bpe_dir":"${bpe_dir}","selected_cols":"${selected_cols}"}"
For what reason you consider about using topp sampling? For this repo, we do not have relevant experience. Perhaps it is still better to use beam search following our practice to get a good result.