fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

How to generate my own distillation dataset for Levenshtien Transformer

Open Ir1d opened this issue 4 years ago • 6 comments

🐛 Bug

According to the doc, it said "The easiest way of performing distillation is to follow the instructions of training a standard transformer model on the same data, and then decode the training set to produce a distillation dataset for NAT".

I want to know what exactly is the process of decoding the training set.

I tried running

fairseq-generate data-bin/wmt17_en_de_joined --path \
checkpoints/transformer_vaswani_wmt_en_de_big/checkpoint_best.pt \
--batch-size 64 --beam 5 --remove-bpe --gen-subset train \
--results-path data-bin/wmt17_en_de_distill

However, no result file was generated anywhere.

I searched the existing issues and it seems no one has solved this yet.

Ir1d avatar Apr 13 '20 00:04 Ir1d

@MultiPath

huihuifan avatar Apr 13 '20 12:04 huihuifan

Hi thanks. Gao Peng emailed me yesterday and shared with me his commands.

srun --gres gpu:1 fairseq-generate     data-bin/wmt16_en_de_bpe32k     --path checkpoint.avg20.pt  --beam 4 --lenpen 0.6 --gen-subset train  > distill_txt/distill_full_0.txt


python examples/backtranslation/extract_bt_data.py --minlen 1 --maxlen 250 --ratio 3 --output extract_txt/distill_full_0 --srclang en --tgtlang de distill_txt/distill_full_0.txt

After doing that, I ran the preprocess again to generate binarized dataset. Then the distillation dataset is good to go.

Ir1d avatar Apr 14 '20 13:04 Ir1d

I'm reopening this issue because I couldn't achieve ideal result (BLEU > 10) when using my generated distillation dataset. And I'm afraid there's something wrong in the generation process.

@MultiPath Could you guide me a little bit on this?

After running the above mentioned commands, I get a lot of meaningless words in my results.

image

The teacher model could achieve BLEU > 27, and the student model couldn't even reach BLEU of 10. I tried to run the commands from https://github.com/pytorch/fairseq/tree/master/examples/nonautoregressive_translation#train-a-model .

Ir1d avatar Apr 17 '20 05:04 Ir1d

Hi, have you solved that? I am facing the same problem.

speedcell4 avatar Jan 08 '21 11:01 speedcell4

Is the problem already solved?

RamoramaInteractive avatar Jan 10 '22 21:01 RamoramaInteractive

Hi thanks. Gao Peng emailed me yesterday and shared with me his commands.

srun --gres gpu:1 fairseq-generate     data-bin/wmt16_en_de_bpe32k     --path checkpoint.avg20.pt  --beam 4 --lenpen 0.6 --gen-subset train  > distill_txt/distill_full_0.txt


python examples/backtranslation/extract_bt_data.py --minlen 1 --maxlen 250 --ratio 3 --output extract_txt/distill_full_0 --srclang en --tgtlang de distill_txt/distill_full_0.txt

After doing that, I ran the preprocess again to generate binarized dataset. Then the distillation dataset is good to go.

Hi , how can I use the srun command, I have tried many methods to no avail

kkeleve avatar Jul 01 '22 05:07 kkeleve