Sam Shleifer
Sam Shleifer
In your MT experiment, what do you use for weights? Here is my loss fn (modified for/from https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune.py#L151 ) ```python lm_logits = outputs[0] # shape bs, seq_len, vocab_size assert lm_logits.shape[-1]...
After 1 hr, baseline has BLEU 21.57 (finetuning mbart on wmt-en-ro) `dropper(dropc=0.3)` has BLEU of 13.5, which seems to me like a bug.
I have run 4 12h experiments now, starting with your step 1. The key results are: - `nn.CrossEntropyLoss` (my original loss fn) works better both with and without `dropper` than...
Did not understand it was a second stage, my bad! I tried it on an english-romanian translator trained on a superset of the wmt english-romanian dataset in another repo (MarianMT)....
is there a mac equivalent?
Awesome. That solves it! This command runs in 20 seconds: ```bash python parlai/scripts/eval_model.py -t blended_skill_talk \ -mf zoo:blender/blender_90M/model --metrics ppl --batchsize 32 --skip-generation true ``` Excited for that PR.
Is `--gpu -1` the correct option for multigpu? When I run `--gpu -1 -bs 24` on an 8 GPU setup only 1 GPU gets used. When I run `--gpu -1...
I meant during training, sorry for being horribly unclear. To rephrase, I am trying to (a) train on multiple GPUs (b) every val step (I know about `--validation_every_n_epochs`) run validation...
Would be super useful. The reason I couldn't figure this out was that ```bash python examples/train_model.py --help | grep val python examples/train_model.py --help | grep gpu ``` don't return the...
More specifically, trying `text/scripts/prepro.sh` with py3 results in TypeError: can't concat str to bytes from `load_vocab`. Are you guys considering updating to py3?