Jack Morris
Jack Morris
yes, the MSMarco longer-sequence-length dataset included sequences from 1 to 128 tokens
Hi @carriex -- this looks right! I'm pretty sure that's the right model. Can you share the error with me? Or maybe we can work out of a Colab to...
Ok there was something weird with the pre-trained model from HuggingFace which I will look into. For now, I developed a workaround; here's some code that properly loads the hypothesizer...
(The only line I changed was adding this:) ```python training_args.corrector_model_from_pretrained = "jxm/vec2text__openai_ada002__msmarco__msl128__hypothesizer" ```
Hmm, the command looks right and the numbers are close but a little low. Oddly the dataset looks different -- I've never seen that example (`"Toonimo Toonimo is a..."`) before....
Yep it should be the last number in the figure, the one you highlighted. And you're right -- it should be the NQ validation set (not MSMARCO, my mistake). Something...
@startakovsky can you be more specific? which example, and what did you think is confusing