Kurt Shuster

Results 198 comments of Kurt Shuster

The following test, along with several others, failed with this error other tests: - `test_chunked_teacher` - `test_distributed_eval_max_exs` - `test_distributed_eval_stream_mode` - `test_distributed_eval_stream_mode_max_exs` - `test_generator_distributed` - `test_multitask_distributed` ``` self = def test_chunked_dynamic_teacher(self):...

Optimal decoding is an open problem in long-form generation. There are several proposed approaches in the literature; we stick with beam search with minimum generation length of 20 and beam/context...

We generally follow the [options specified here](https://github.com/facebookresearch/ParlAI/blob/main/parlai/opt_presets/gen/blenderbot.opt). Specifically: ``` --beam-size 10 \ --beam-context-block-ngram 3 \ --beam-block-ngram 3 \ --beam-block-full-context True \ ``` Everything else you can leave as default

either is ok. setting to `True` will block on all context even if it doesnt fit in the model's context truncation window (128 tokens for BB2 3B)

Suppose you're in a very long conversation, where there are e.g. 1000 tokens of context. With `--beam-block-full-context False`, the model will only perform beam-blocking on the 128 tokens that fit...

Using nucleus sampling (`--inference nucleus`) with a high p value (`--topp 0.9`) will increase randomness. Same with `--inference topk` with high `--topk`. You can indeed play around with `--temperature` as...

closing for now, please reopen if there are further questions

could you also rebase on main to see if your teacher tests pass?

we recently added cuda kernels for GPU beam blocking, however these have not been tested on Windows. You can perhaps try removing the offending files to complete installation but I...

perhaps cc @pearlli98 if you have any suggestions for why it might fail building on Windows